Hello,
I have a question about one part.
decoder_padding_mask,
decoder_causal_mask=causal_mask,
past_key_values=past_key_values,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
if not return_dict:
return decoder_outputs + encoder_outputs
return Seq2SeqModelOutput(
last_hidden_state=decoder_outputs.last_hidden_state,
past_key_values=decoder_outputs.past_key_values,
decoder_hidden_states=decoder_outputs.hidden_states,
decoder_attentions=decoder_outputs.attentions,
cross_attentions=decoder_outputs.cross_attentions,
encoder_last_hidden_state=encoder_outputs.last_hidden_state,
encoder_hidden_states=encoder_outputs.hidden_states,
encoder_attentions=encoder_outputs.attentions,
Why the BartModel return not just decoder_outputs
and need to be decoder_outputs + encoder_outputs
Thank you.
Because when you do generation you usually do a single pass in the encoder and reuse it’s output for the subsequent token generation for efficiency so you need to access the encoder output from the first forward pass.
1 Like