Confused about max_length and max_new_tokens

I’m trying to run the example code from flan-t5-small:

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

I’m getting the following warning:

UserWarning: Neither max_length nor max_new_tokens has been set, max_length will default to 20 (generation_config.max_length). Controlling max_length via the config is deprecated and max_length will be removed from the config in v5 of Transformers – we recommend using max_new_tokens to control the maximum length of the generation.

How should I configure this? Is this something like on OpenAI playground where the default setting is 256 but the model actually supports 4000 tokens?

2 Likes

hey, was this resolved?

1 Like

No, I’m waiting for a reply.

1 Like

This outputs = model.generate(input_ids,max_length= 60) worked for me without giving any error.

5 Likes

just give a generate number:
outputs = model.generate(input_ids,max_new_tokens=4000)

1 Like

I know I’m late now. But for any future preferences.
what the max_length and max_new_tokens do.
In max_length we get the maximum length including the input and output tokens.
But in max_new_tokens we get the maximum output excluding the output.

Let me show you using the code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer,AutoModelForSeq2SeqLM

torch.set_default_device("cuda")


model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200) 
text = tokenizer.batch_decode(outputs)[0]


outputs_2 = model.generate(**inputs,max_new_tokens=200) 
text_2 = tokenizer.batch_decode(outputs_2)[0]



prompt_tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
output_tokens_1 = tokenizer.convert_ids_to_tokens(outputs[0])
output_tokens_2 = tokenizer.convert_ids_to_tokens(outputs_2[0])
num_prompt_tokens = len(prompt_tokens)
num_output_tokens = len(output_tokens_1)
num_output_tokens_2 = len(output_tokens_2)
print("Number of tokens in prompt:", num_prompt_tokens)
print("Number of tokens from max_length output:", num_output_tokens)
print("Number of tokens from max_new_tokens output:", num_output_tokens_2)

The output that I got

Number of tokens in prompt: 23
Number of tokens from max_length output: 200
Number of tokens from max_new_tokens output: 223

Hope it helps

9 Likes