There are several methods, but you can use TransformersModel
instead of HfApiModel
and it will work. When using large models, powerful GPUs are required, so please be careful. Well, I think SmolLM
below will work with about 1GB of VRAM…
Also, Ollama
is faster and uses less VRAM, but I think it will be a little difficult to set up (compared to TransformersModel
…).
from smolagents import TransformersModel
model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")