Decoder_start_token_id per sample or per batch during training

Hi All,

I’m working on a Translation model using EncoderDecoderModel.
Each batch contains samples only from one language.
How can we define decoder_start_token_id per each batch?
Sometimes, decoder_start_token_id is <ENG> and sometimes it’s <FRE>
I use model.module.config.decoder_start_token_id to change decoder_start_token_id per each batch. But this is causing lots of inconsistency when using Multiple GPUs and DeepSpeed.
Any suggestions?
It seems forward function does not support decoder_start_token_id. Only generate function supports it. However, I need to specify decoder_start_token_id per each batch during training.
Any suggestions?

Thank you