Hi,
I don’t think that’s the recommended way. The recommended way is to pass the torch_dtype
as follows:
from transformers import pipeline
import torch
pipe = pipeline(model="gpt2", torch_dtype=torch.float16)
In case you want to leverage 4-bit or 8-bit, then you can do that as follows:
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
import torch
# note that 4bit requires at least one GPU to be available
model = AutoModelForCausalLM.from_pretrained("gpt2", load_in_4bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)