I’m currently attempting to learn how to generate text with LLMs.
Unfortunately, it’s running very slowly, and I suspect I may have messed up.
I’m using an Nvidia Tesla M10 (I know, it’s not the latest and greatest ).
Here is my Code:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
# Update the model name according to Phi-3
phi3_model_name = "unsloth/Phi-3-mini-4k-instruct"
# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(phi3_model_name)
model = AutoModelForCausalLM.from_pretrained(phi3_model_name, device_map="auto", load_in_4bit=True)
# Create the text generation pipeline
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16, # Specify the PyTorch datatype
tokenizer=tokenizer,
)
# Loop to obtain and generate text based on user input
while True:
prompt = "Write text about a fishing boat"
sequences = pipeline(
prompt,
do_sample=True,
min_length=50,
max_length=150,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
temperature=0.2,
top_p=0.95,
top_k=40,
num_beams=4,
)
for sequence in sequences:
print(sequence['generated_text'])
Edit: might be confusing, but I had promt=input(“…”) in the code. Therefore, while True: