Confused about max_length and max_new_tokens

I know I’m late now. But for any future preferences.
what the max_length and max_new_tokens do.
In max_length we get the maximum length including the input and output tokens.
But in max_new_tokens we get the maximum output excluding the output.

Let me show you using the code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer,AutoModelForSeq2SeqLM

torch.set_default_device("cuda")


model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200) 
text = tokenizer.batch_decode(outputs)[0]


outputs_2 = model.generate(**inputs,max_new_tokens=200) 
text_2 = tokenizer.batch_decode(outputs_2)[0]



prompt_tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
output_tokens_1 = tokenizer.convert_ids_to_tokens(outputs[0])
output_tokens_2 = tokenizer.convert_ids_to_tokens(outputs_2[0])
num_prompt_tokens = len(prompt_tokens)
num_output_tokens = len(output_tokens_1)
num_output_tokens_2 = len(output_tokens_2)
print("Number of tokens in prompt:", num_prompt_tokens)
print("Number of tokens from max_length output:", num_output_tokens)
print("Number of tokens from max_new_tokens output:", num_output_tokens_2)

The output that I got

Number of tokens in prompt: 23
Number of tokens from max_length output: 200
Number of tokens from max_new_tokens output: 223

Hope it helps

16 Likes