Model does not have max_length or max_token_num parameters and generated on
query='What types of car tyres exist in the market?'
This:[tyres]
This is weird, if I concat this output to the original query in loop fashion I will get dead loop behavior like that:
0 What types of car tyres exist in the market?
/home/marat/anaconda3/envs/dl/lib/python3.11/site-packages/transformers/generation/utils.py:1494: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
warnings.warn(
1 What types of car tyres exist in the market? tyres for tyres
2 What types of car tyres exist in the market? tyres for tyres tyres for cars
3 What types of car tyres exist in the market? tyres for tyres tyres for cars tyres for cars
4 What types of car tyres exist in the market? tyres for tyres tyres for cars tyres for cars tyres for cars
What I am doing wrong? llama-2 model behavior with almost same code work much more reasonable
My code:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
#model_id = "philschmid/flan-t5-xxl-sharded-fp16"
#model_id = "google/flan-t5-small"
#model_id = "google/flan-t5-base"
model_id = "google/flan-t5-large"
model = AutoModelForSeq2SeqLM.from_pretrained(model_id,
load_in_8bit=True,
device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
query = "What types of car tyres exist in the market?"
print(query)
print('=====')
for i in range(5):
print(i, query)
inputs = tokenizer(query, return_tensors="pt")
outputs = model.generate(**inputs)
query += ' ' + tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(query)