Newbie here, but I’m working on a GPT-2 project and trying to visualize the attentions. The docs say the attention output is a
Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
.
That sounds like a 1-D tuple of 4-D tensors, but my output returns a 2-D tuple of 4-D tensors. What is the second dimension of the tuple?
This is my code, nothing special:
def generate_text_beam(encoded_seq, num_results = 1):
# keys: sequences, attentions
outputs_dict = model.generate(
encoded_seq,
num_beams=num_results,
max_length=200,
num_return_sequences=num_results,
no_repeat_ngram_size=1,
remove_invalid_values=True,
output_attentions=True,
return_dict_in_generate=True,
)
text = [tokenizer.decode(prediction,
skip_special_tokens=True,
output_attentions=True
) for prediction in outputs_dict['sequences']]
return (text, outputs_dict['attentions'])
# ...
generated_text, attentions = generate_text_beam(encoded_seq, 1)