Hi all, thank you so much for the wonderful service.
I have some doubts regarding the training details for MM-Imdb dataset.
Are the image encoders and tokenizer embeddings fine-tuned during training on MM-Imdb dataset? If not, can you suggest a way to do it or refer any material for help?
Is there a way to modify the code so that the model’s pre-trained weights can be used for sequence-to-sequence generations tasks instead of classification?
Any suggestions or comments will be of great help.