Attempt to generate Text, but its to slow

I’m currently attempting to learn how to generate text with LLMs.
Unfortunately, it’s running very slowly, and I suspect I may have messed up.
I’m using an Nvidia Tesla M10 (I know, it’s not the latest and greatest :wink: ).

Here is my Code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers

# Update the model name according to Phi-3
phi3_model_name = "unsloth/Phi-3-mini-4k-instruct"

# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(phi3_model_name)
model = AutoModelForCausalLM.from_pretrained(phi3_model_name, device_map="auto", load_in_4bit=True)

# Create the text generation pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,  # Specify the PyTorch datatype
    tokenizer=tokenizer,
)

# Loop to obtain and generate text based on user input
while True:
    prompt = "Write text about a fishing boat"
    sequences = pipeline(
        prompt,
        do_sample=True,
        min_length=50,
        max_length=150,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        temperature=0.2,
        top_p=0.95,
        top_k=40,
        num_beams=4,
    )

    for sequence in sequences:
        print(sequence['generated_text'])

Edit: might be confusing, but I had promt=input(“…”) in the code. Therefore, while True: