Different lm_head size and vocab_size

Avoidable · July 12, 2022, 10:55am

Hi i was playing around with facebook/opt-1.3b and i noticed that it sometimes generates tokens with values bigger than the tokenizer vocab_size.
Indeed running this code

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "facebook/opt-1.3b"
tokenizer_name = "facebook/opt-1.3b"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

print(model.lm_head)
print(tokenizer.vocab_size)

it seems that the output size of the lm_head layer is bigger than the vocab_size

Linear(in_features=2048, out_features=50272, bias=False)
50265

Am i using opt-1.3b wrong? is this supposed to happen for some reason? shall i just ignore the logits after the 50265-th?

Edit:
this happens when using generate with this arguments

outputs = self.model.generate(
    tokens.unsqueeze(0),
    return_dict_in_generate=True,
    output_scores=True,
    max_new_tokens=1,
    max_length=None,
)

Topic		Replies	Views
Difference between vocab_size in model T5forConditionalGeneration "t5-small" and its corresponding Tokenizer "t5-small" 🤗Transformers	1	634	June 30, 2023
Different model.generate() predictions between batched and unbatched/padded token inputs 🤗Transformers	2	2248	August 26, 2023
T51.1 vocab seems to inlcude added tokens? Beginners	0	66	May 7, 2024
Output dimension of AutoModelForCausalLM Models	1	1400	July 2, 2024
Vocab_size value for facebook/w2v-bert-2.0 Models	0	253	November 13, 2024

Different lm_head size and vocab_size

Related topics