Prakash Hinduja Switzerland (Swiss) How do I load a pre-trained model in Hugging Face?

Hi everyone,

I’m Prakash Hinduja from Geneva, Switzerland (Swiss), and I’m new to using Hugging Face. I’m trying to load a pre-trained model from the Hugging Face Hub, but I’m not entirely sure how to go about it.

Could anyone share their suggestions or best practices for loading a model? Maybe some code snippets or common pitfalls to avoid would be really helpful!

Thanks in advance for your help!

Regards
Prakash Hinduja Geneva, Switzerland (Swiss)

1 Like

Basically, just using from_pretrained() is sufficient. If you modify the loaded model weights and then use save_pretrained(), the model weights will be created locally. If you only need to use the model for inference, using Pipeline class is simpler, but it lacks flexibility.

The usage after loading varies depending on the model, but generally, you tokenize the loaded data using the tokenizer and then pass it to model.generate(). In many cases, sample code is available on the model’s documentation page. (Note that sample code may be incomplete.)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

checkpoint = "HuggingFaceTB/SmolLM2-135M-Instruct" # Hugging Face's Repo ID or your local folder path

device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(checkpoint) # Loading tokenizer of the model
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) # Loading pretrained weight of the model
#model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto") # Loading pretrained weight of the model with Accelerate library

#model.save_pretrained("smollm2") # If you want to save loaded model weights locally
#tokenizer.save_pretrained("smollm2") # If you want to save loaded tokenizer settings locally