Google/flan-t5-xxx unexpected behavior on inference

Marat61 · August 2, 2023, 11:44am

Model does not have max_length or max_token_num parameters and generated on
query='What types of car tyres exist in the market?'
This:[tyres]
This is weird, if I concat this output to the original query in loop fashion I will get dead loop behavior like that:

0 What types of car tyres exist in the market?

/home/marat/anaconda3/envs/dl/lib/python3.11/site-packages/transformers/generation/utils.py:1494: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.

warnings.warn(

1 What types of car tyres exist in the market? tyres for tyres

2 What types of car tyres exist in the market? tyres for tyres tyres for cars

3 What types of car tyres exist in the market? tyres for tyres tyres for cars tyres for cars

4 What types of car tyres exist in the market? tyres for tyres tyres for cars tyres for cars tyres for cars

What I am doing wrong? llama-2 model behavior with almost same code work much more reasonable

My code:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

#model_id = "philschmid/flan-t5-xxl-sharded-fp16"
#model_id = "google/flan-t5-small"
#model_id = "google/flan-t5-base"
model_id = "google/flan-t5-large"

model = AutoModelForSeq2SeqLM.from_pretrained(model_id,
                                              load_in_8bit=True,
                                              device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

query = "What types of car tyres exist in the market?"

print(query)
print('=====')

for i in range(5):
    print(i, query)
    inputs = tokenizer(query, return_tensors="pt")
    outputs = model.generate(**inputs)
    query += ' ' + tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(query)

Topic		Replies	Views
Confused about max_length and max_new_tokens 🤗Transformers	7	36079	September 5, 2024
Minimum number of tokens in generate Models	0	1063	March 10, 2023
Max_new_tokens warning for Flan-T5 fine-tuning Models	3	876	March 9, 2025
Issues regarding using model google t-5 large 🤗Datasets	1	203	June 24, 2024
Flan-t5-xl generates only one sentence Models	3	4097	April 6, 2023

Google/flan-t5-xxx unexpected behavior on inference

Related topics