I am performing inference with llama-3-8b for the purposes of text generation. If I only pass 1 prompt at a time, my code works. However, since I have a for loop that loops over 500 prompts and calling the model for each prompt, hugging face gave me the following warning:
UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
I have thus tested out the suggestion in the warning but I am getting an error and I can’t figure out what’s the mistake. Here’s a snippet of my code:
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
pipeline
)
import torch
from datasets import Dataset
from transformers.pipelines.pt_utils import KeyDataset
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct",
load_in_8bit=True,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto",
temperature=1
)
message = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
messages = [message, message]
dataset_messages = {"messages": messages}
messages_dataset = Dataset.from_dict(dataset_messages)
sequences = pipe(
KeyDataset(messages_dataset,"messages"),
max_new_tokens=200,
do_sample=True,
return_full_text=False,
top_k=1
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
However, I get the following error:
text_generation.py", line 266, in preprocess
prefix + prompt_text,
~~~~~~~^
TypeError: can only concatenate str (not “list”) to str
when I call pipe on the datasets object. My datasets version is 2.18.0, I am aware of this issue but I believe this is not due to the update anymore. Can anyone help me figure out if I used datasets correctly?
Thank you so much!