I am looking to retrain the Vision-Language model VisualBERT on a specific dataset (images + text). How can I do this?
Out-of-the-box VisualBERT is trained on the COCO dataset, and I’d like to retrain it on my own data so that it retains the original learned model parameters, whilst being updated with the new data.
I’m unsure what section of the code is I need to change.
a configuration (or model) is typically loaded from a HF repository, like dandelin/vilt-b32-finetuned-vqa · Hugging Face in this case. The dataset needs to be prepared separately, as shown in the notebook.