Newbie here, but I’m working on a GPT-2 project and trying to visualize the attentions. The docs say the attention output is a
torch.FloatTensor (one for each layer) of shape
(batch_size, num_heads, sequence_length, sequence_length) .
That sounds like a 1-D tuple of 4-D tensors, but my output returns a 2-D tuple of 4-D tensors. What is the second dimension of the tuple?
This is my code, nothing special:
def generate_text_beam(encoded_seq, num_results = 1): # keys: sequences, attentions outputs_dict = model.generate( encoded_seq, num_beams=num_results, max_length=200, num_return_sequences=num_results, no_repeat_ngram_size=1, remove_invalid_values=True, output_attentions=True, return_dict_in_generate=True, ) text = [tokenizer.decode(prediction, skip_special_tokens=True, output_attentions=True ) for prediction in outputs_dict['sequences']] return (text, outputs_dict['attentions']) # ... generated_text, attentions = generate_text_beam(encoded_seq, 1)