Adding a New tokens to ViT


I want to add 2 custom embeddings (tokens) to a pretrained transformers
Those tokens will represent some argument to the image
I have seen there is a similar method to BERT and text transformers (add a word to the vocabulary) but I did not find something for image transformers

Currently I do it with ugly code that overrides the embeddings of the built-in Vit and adds the token.
If you have a smarter solutions / similar ideas I would be happy to hear :hugs: