Use decoder_input_ids with deepspeed

rlight · May 9, 2023, 9:26pm

Hello,
This is a question about how to use decoder_input_ids with deepspeed.
I have a language generation task that has decoder_input_ids. For example, I need the generation model to start the generation with ‘yes’.
My solution is that I pass the decoder_input_ids to model.generate() by inputs['decoder_input_ids'] within Seq2SeqTrainer, is this the right way?

Edit: It doesn’t seem to work that way. I suspect that the trainer.py does not parse the decoder_input_ids as an argument inside the model(*input). Any suggestions?

Topic		Replies	Views
In SpeechSeq2Seq models, is it possible to pass decoder_input_ids for each sample during the training time using huggingface Trainer? 🤗Transformers	0	28	December 12, 2024
Running model.generate() in deep speed training DeepSpeed	2	530	July 25, 2024
Conditional generation from hidden states only Beginners	0	352	February 22, 2022
Decoder_start_token_id per sample or per batch during training 🤗Transformers	0	226	February 16, 2024
What decoder inputs is the trainer creating when I use it with AutoModelForSeq2SeqLM and a model that needs Decoder Inputs? Beginners	0	183	May 13, 2023

Use decoder_input_ids with deepspeed

Related topics