I’m trying to run the example code from flan-t5-small:
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")
input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
I’m getting the following warning:
UserWarning: Neither max_length nor max_new_tokens has been set, max_length will default to 20 (generation_config.max_length). Controlling max_length via the config is deprecated and max_length will be removed from the config in v5 of Transformers – we recommend using max_new_tokens to control the maximum length of the generation.
How should I configure this? Is this something like on OpenAI playground where the default setting is 256 but the model actually supports 4000 tokens?
I know I’m late now. But for any future preferences.
what the max_length and max_new_tokens do.
In max_length we get the maximum length including the input and output tokens.
But in max_new_tokens we get the maximum output excluding the output.
Let me show you using the code
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer,AutoModelForSeq2SeqLM
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
inputs = tokenizer('''def print_prime(n):
"""
Print all primes between 1 and n
"""''', return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
outputs_2 = model.generate(**inputs,max_new_tokens=200)
text_2 = tokenizer.batch_decode(outputs_2)[0]
prompt_tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
output_tokens_1 = tokenizer.convert_ids_to_tokens(outputs[0])
output_tokens_2 = tokenizer.convert_ids_to_tokens(outputs_2[0])
num_prompt_tokens = len(prompt_tokens)
num_output_tokens = len(output_tokens_1)
num_output_tokens_2 = len(output_tokens_2)
print("Number of tokens in prompt:", num_prompt_tokens)
print("Number of tokens from max_length output:", num_output_tokens)
print("Number of tokens from max_new_tokens output:", num_output_tokens_2)
The output that I got
Number of tokens in prompt: 23
Number of tokens from max_length output: 200
Number of tokens from max_new_tokens output: 223