Encoder-decoder `generate()` with forced start for decoder?

astariul · November 18, 2022, 3:06am

I’m training an encoder-decoder model (BART) on my task. My task is spell checking, I framed it as a seq2seq task, so for example given the sentence I liike thus framewok much., the model is trained to predict the corrected sentence : I like this framework much..

Now at inference time, I would like to force the beginning of the decoder sentence. For example for the partial sentence I like this framewok :

I want to input I like this framewok in the encoder
I want to give I like this in the decoder, and let the decoder predict the next word (in this case if the model is well-trained, it should predict framework).

How can I achieve this goal with the generate() method ?

nielsr · November 18, 2022, 7:59am

cc’ing @joaogante here

astariul · November 18, 2022, 10:21am

I could get something to work by using the keyword argument decoder_input_ids :

# Create the input for the encoder : the sentence with typo
encoder_inp = tokenizer(["I like this framewok"], max_length=model.config.max_position_embeddings, padding=True, truncation=True, return_tensors="pt")

# Create the input for the decoder : first part of the sentence, without typo
decoder_inp = tokenizer(["I like this"], max_length=model.config.max_position_embeddings, padding=True, truncation=True, return_tensors="pt")

# Remove the EOS token generated by the tokenizer
decoder_inp["input_ids"] = decoder_inp["input_ids"][:, :-1]

# Then call the generate method with the encoder's input AND decoder's initial input
out = model.generate(encoder_inp["input_ids"], num_beams=2, min_length=1, max_length=model.config.max_position_embeddings, decoder_input_ids=decoder_inp["input_ids"])

This seems to work as I expect it to, but I’d love to get a second opinion from someone more knowledgeable !

joaogante · November 18, 2022, 10:44am

Hi @astariul – yup, using decoder_input_ids is precisely what you should do

Topic		Replies	Views
Generate 'continuation' for seq2seq models Intermediate	1	1866	February 22, 2021
Rewriting generate function for manual decoder input 🤗Transformers	7	3562	July 11, 2022
Generate without using the generate method Intermediate	8	6162	January 17, 2025
Encoder-Decoder model only generates bos_token's [<s><s><s>] Models	17	3159	December 6, 2022
Decoder_start_token_id per sample or per batch during training 🤗Transformers	0	227	February 16, 2024

Encoder-decoder `generate()` with forced start for decoder?

Related topics