I have been running everything locally by using either the TransfomerModel
or MLXModel
classes instead of the HfApiModel
class.
from smolagents import CodeAgent, MLXModel
model = MLXModel(
"mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
{
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8,
"min_p": 0.05,
"num_ctx": 32768,
},
)
agent = CodeAgent(tools=[], model=model, add_base_tools=True)
agent.run(
"Could you give me the 40th number in the Fibonacci sequence?",
)
You can also use Ollama through the LiteLLMModel if you prepend “ollama_chat/” to the model name like such
from smolagents import CodeAgent, LiteLLMModel
model = LiteLLMModel(
model_id="ollama_chat/qwen2.5-coder:14b-instruct-q4_K_M",
api_base="http://localhost:11434",
num_ctx=8192,
)
agent = CodeAgent(tools=[], model=model, add_base_tools=True)
agent.run(
"Could you give me the 40th number in the Fibonacci sequence?",
)
Works like a charm!