I have a pretrained T5 model that predicts the solution to quadratic equations and I want to use bertviz library for visualizing attention. In all the examples I found the length of the input and output are the same, but in my case they are different.
tokenizer = PreTrainedTokenizerFast.from_pretrained("my_repo/content")
model = T5ForConditionalGeneration.from_pretrained("my_repo/content", output_attentions=True)
For example for this input:
inputs = tokenizer("7*x^2+3556*x+451612=0", return_tensors="pt")
The model predicts:
outputs = model.generate(inputs.input_ids, attention_mask=inputs.attention_mask, max_length=80, min_length=10, output_attentions=True, return_dict_in_generate=True)
This sequnce:
D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0
It’s length is 79, whereas the length of the input is 18.
Then I do as in the example:
encoder_input_ids = tokenizer("7*x^2+3556*x+451612=0", return_tensors="pt", add_special_tokens=True).input_ids
with tokenizer.as_target_tokenizer():
decoder_input_ids = tokenizer("D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0",
return_tensors="pt", add_special_tokens=True).input_ids
encoder_text = tokenizer.convert_ids_to_tokens(encoder_input_ids[0])
decoder_text = tokenizer.convert_ids_to_tokens(decoder_input_ids[0])
model_view(
cross_attention = outputs.cross_attentions[0],
encoder_attention = encoder_attention,
decoder_attention = decoder_attention,
encoder_tokens = encoder_text,
decoder_tokens = decoder_text)
However I get this error:
AttributeError: 'tuple' object has no attribute 'shape'
For some reason the attentions I get are in a form of tuples (cross attention is even a tuple of tuples). In bertviz docs it is said that the dimensions should be like this:
For encoder-decoder models:
encoder_attention: list of ``torch.FloatTensor``(one for each layer) of shape
``(batch_size(must be 1), num_heads, encoder_sequence_length, encoder_sequence_length)``
decoder_attention: list of ``torch.FloatTensor``(one for each layer) of shape
``(batch_size(must be 1), num_heads, decoder_sequence_length, decoder_sequence_length)``
cross_attention: list of ``torch.FloatTensor``(one for each layer) of shape
``(batch_size(must be 1), num_heads, decoder_sequence_length, encoder_sequence_length)``
encoder_tokens: list of tokens for encoder input
decoder_tokens: list of tokens for decoder input
I don’t understand why am I not getting the right dimensions. Is it because my input and ouput sequences are of different sizes?
I trained my model on a custom dataset. I changed it’s lm_head.out_features
to 1
and it’s vocab_size
to 100_000
.
Some info about dimensions:
cross_attention = outputs.cross_attentions
len(cross_attention[0]) = 12
cross_attention[0][0].shape = torch.Size([1, 12, 1, 18])
decoder_attention = outputs.decoder_attentions
len(decoder_attention[0]) = 12
decoder_attention[0][0].shape = torch.Size([1, 12, 1, 1])
encoder_attention = outputs.encoder_attentions
len(encoder_attention) = 12
encoder_attention[0].shape = torch.Size([1, 12, 18, 18])
If I do something like this:
model_view(
cross_attention = outputs.cross_attentions[0],
encoder_attention = encoder_attention[0],
decoder_attention = decoder_attention[0],
encoder_tokens = encoder_text,
decoder_tokens = decoder_text)
I get this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-53-ae747cb8eba0> in <cell line: 1>()
----> 1 model_view(
2 cross_attention = outputs.cross_attentions[0],
3 encoder_attention = encoder_attention[0],
4 decoder_attention = decoder_attention[0],
5 encoder_tokens = encoder_text,
1 frames
/usr/local/lib/python3.9/dist-packages/bertviz/model_view.py in model_view(attention, tokens, sentence_b_start, prettify_tokens, display_mode, encoder_attention, decoder_attention, cross_attention, encoder_tokens, decoder_tokens, include_layers, include_heads, html_action)
128 if include_heads is None:
129 include_heads = list(range(n_heads))
--> 130 encoder_attention = format_attention(encoder_attention, include_layers, include_heads)
131 attn_data.append(
132 {
/usr/local/lib/python3.9/dist-packages/bertviz/util.py in format_attention(attention, layers, heads)
9 # 1 x num_heads x seq_len x seq_len
10 if len(layer_attention.shape) != 4:
---> 11 raise ValueError("The attention tensor does not have the correct number of dimensions. Make sure you set "
12 "output_attentions=True when initializing your model.")
13 layer_attention = layer_attention.squeeze(0)
ValueError: The attention tensor does not have the correct number of dimensions. Make sure you set output_attentions=True when initializing your model.