Confused about max_length and max_new_tokens

hyperknot · February 1, 2023, 11:07am

I’m trying to run the example code from flan-t5-small:

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

I’m getting the following warning:

UserWarning: Neither max_length nor max_new_tokens has been set, max_length will default to 20 (generation_config.max_length). Controlling max_length via the config is deprecated and max_length will be removed from the config in v5 of Transformers – we recommend using max_new_tokens to control the maximum length of the generation.

How should I configure this? Is this something like on OpenAI playground where the default setting is 256 but the model actually supports 4000 tokens?

arastu-mudgal · February 2, 2023, 4:13am

hey, was this resolved?

hyperknot · February 3, 2023, 1:55am

No, I’m waiting for a reply.

gputask009 · March 13, 2023, 11:32am

This outputs = model.generate(input_ids,max_length= 60) worked for me without giving any error.

ShaYuer · September 14, 2023, 3:29pm

just give a generate number:
outputs = model.generate(input_ids,max_new_tokens=4000)

Navanit-AI · December 14, 2023, 7:44am

I know I’m late now. But for any future preferences.
what the max_length and max_new_tokens do.
In max_length we get the maximum length including the input and output tokens.
But in max_new_tokens we get the maximum output excluding the output.

Let me show you using the code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer,AutoModelForSeq2SeqLM

torch.set_default_device("cuda")


model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200) 
text = tokenizer.batch_decode(outputs)[0]


outputs_2 = model.generate(**inputs,max_new_tokens=200) 
text_2 = tokenizer.batch_decode(outputs_2)[0]



prompt_tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
output_tokens_1 = tokenizer.convert_ids_to_tokens(outputs[0])
output_tokens_2 = tokenizer.convert_ids_to_tokens(outputs_2[0])
num_prompt_tokens = len(prompt_tokens)
num_output_tokens = len(output_tokens_1)
num_output_tokens_2 = len(output_tokens_2)
print("Number of tokens in prompt:", num_prompt_tokens)
print("Number of tokens from max_length output:", num_output_tokens)
print("Number of tokens from max_new_tokens output:", num_output_tokens_2)

The output that I got

Number of tokens in prompt: 23
Number of tokens from max_length output: 200
Number of tokens from max_new_tokens output: 223

Hope it helps

vaibhavdabhade · June 8, 2024, 8:17am

Yes, It worked without this error now. Thanks.

ngogiahan · September 5, 2024, 7:58am

this solve my problem, thank you!

Topic		Replies	Views
Max_new_tokens warning for Flan-T5 fine-tuning Models	3	884	March 9, 2025
"What’s the Difference Between max_length and max_new_tokens?" 🤗Transformers	0	610	September 5, 2024
Minimum number of tokens in generate Models	0	1064	March 10, 2023
Google/flan-t5-xxx unexpected behavior on inference Models	0	751	August 2, 2023
Pass tokenizer or model arguments Inference Endpoints on the Hub	0	857	October 17, 2022

Confused about max_length and max_new_tokens

Hope it helps

Related topics