Error Training Vision Encoder Decoder for Image Captioning

Hi,

You were already on the good way! The only “mistake” I see here is that GPT2 doesn’t have a CLS token. The CLS token is only defined for encoder-only Transformers such as BERT, RoBERTa. So in this case, the decoder start token can be set to the bos (beginning of sequence) token:

model.config.decoder_start_token_id = tokenizer.bos_token_id

2 Likes