What does the decoder with past values means

Rkoy · August 1, 2022, 9:47am

Those three parts consist of the encoder, the “decoder” (which actually consists of the decoder with the language modeling head), and the “decoder” with pre-computed key/values as additional inputs. This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding

Can you explain in detail about what is the difference between he decoder with the LM head and the decoder with the pre-computed the key/values. Both of them seems to be very confusing. Can you please explain it in detail and it is mentioned here

Also would like to know where the decoder_with_LM_Head and decoder_with_Past_head is used during the inference.

echarlaix · August 5, 2022, 3:38pm

Hi @Rkoy,

Since #241, we have enabled the possibility to only export one decoder : the latter will not have pre-computed key/values as inputs. This will results in the past_key_values to be computed at each generation step. To enable this export you only need to set use_cache to False when calling the from_pretrained method. To speed up decoding by leveraging the key/values hidden-states which have already been computed in the previous generation step, you need to export a second decoder with additional pre-computed key/values as inputs.

Topic		Replies	Views
The way to get Seq2SeqLM's `decoder_input_ids` to obtain `past_key_values` Beginners	0	1351	October 25, 2020
Outputs change if re-using KVCache (past_key_values) for model.forward and generation 🤗Transformers	5	193	January 22, 2025
Why past_key_values is not in GreedySearchDecoderOnlyOutput? 🤗Transformers	1	2011	October 4, 2022
What decoder inputs is the trainer creating when I use it with AutoModelForSeq2SeqLM and a model that needs Decoder Inputs? Beginners	0	183	May 13, 2023
Using past_key_values to provide context to decoder results in same output 🤗Transformers	0	696	December 23, 2023

What does the decoder with past values means

Related topics