anebot
March 20, 2025, 2:52pm
1
I have the following code, but when I run it, I get the following error:
“Error in generating model output: local_llama_response() got an unexpected keyword argument ‘stop_sequences’”
I assume that I need to format the prompt in a specific way for the LLaMA model, but I’m not sure how to do it. Could you please assist me?
I’ve also tried passing the model directly to the CodeAgent
constructor, but it didn’t work.
Code :
# Load the LLaMA model
llm = Llama(model_path=f"../models/mistral-7b-instruct-v0.1.Q4_K_M.gguf", n_ctx=2048)
# Define a wrapper function to avoid 'stop_sequences' error
def local_llama_response(prompt):
"""Generate a response using the local LLaMA model."""
response = llm(prompt, max_tokens=100, stop=["\n"]) # Stop passed correctly
return response["choices"][0]["text"]
agent = CodeAgent(tools=[get_capture_channels_recording_paths,get_recording_server], model=local_llama_response)
while True:
user_input = input("Type something (or 'bye' to exit): ").strip().lower()
if user_input == "bye":
print("Goodbye!")
break
#print(f"You said: {user_input}")
agent.run(user_input)
Thanks!
1 Like
It’s difficult to link it with Llama.cpp because it hasn’t been officially supported by the smolagents side yet.
Regardless, even TransformersModel may not work if max_tokens is small, so that may be the case.
opened 11:48AM - 29 Jan 25 UTC
closed 05:35PM - 13 Feb 25 UTC
bug
**Describe the bug**
When replacing `HfApiModel` with `TransformersModel` in `ex… amples/benchmark.ipynb`, the eval results for `meta-llama/Llama-3.1-8B-Instruct` (and various other published models) are far worse than published (scores of less than 5).
**Code to reproduce the error**
https://github.com/danielkorat/smolagents/blob/transformers/examples/benchmark-transformers.ipynb
**Error logs (if any)**
Seems like a big part of problem is the parsing of the LLM output (specifically the assistant role):

Also, the regex parsing error arises in nearly all examples.
**Expected behavior**
Trying to reproduce the results for `meta-llama/Llama-3.1-8B-Instruct`, as published in the original notebook:

**Packages version:**
```python
>>> smolagents.__version__
'1.5.0.dev'
```
**Additional context**
Add any other context about the problem here.
```bash
accelerate==1.3.0
datasets==3.1.0
matplotlib==3.10.0
matplotlib-inline==0.1.7
numpy==1.26.4
seaborn==0.13.2
sentence-transformers==3.3.0
sympy==1.13.1
transformers==4.48.1
```
main
← ryantzr1:llama-cpp-models
opened 12:54PM - 31 Jan 25 UTC
Closes #449
## Description
This pull request introduces the `LlamaCppModel… ` class to the **smolagents** library, enabling seamless integration with `llama.cpp` models. This enhancement expands the library's versatility, allowing users to leverage optimized performance and efficiency offered by `llama.cpp` for running large language models.
### **Features Added:**
- **LlamaCppModel Class:**
A new class to interact with `llama.cpp` models, supporting both local model loading and Hugging Face repository integration.
- **Parameter Handling:**
Comprehensive parameter management, including GPU layers, context size, and maximum token generation.
- **Conditional Tool Integration:**
Tools are integrated only when explicitly provided, ensuring optimized resource utilization.
### **Motivation and Context**
Integrating `llama.cpp` models into **smolagents** addresses the growing demand for efficient and resource-optimized language model interactions. This addition allows users to benefit from `llama.cpp`'s capabilities while maintaining the flexibility and functionality that **smolagents** offers.
### **How Has This Been Tested?**
- **Integration Tests:**
Tested the `LlamaCppModel` within a `CodeAgent` to ensure seamless interaction and tool usage.
### **Example Usage from text_to_sql.py:**
```python
from smolagents import LlamaCppModel, CodeAgent
from smolagents.tools import SQLTool # Assume SQLTool is predefined
# Initialize the SQL tool
sql_engine = SQLTool(...)
# Initialize the LlamaCppModel
model = LlamaCppModel(
repo_id="bartowski/Qwen2.5-7B-Instruct-1M-GGUF",
filename="Qwen2.5-7B-Instruct-1M-IQ2_M.gguf",
n_ctx=8192,
max_tokens=8192,
)
# Create the CodeAgent with the SQL tool and LlamaCppModel
agent = CodeAgent(
tools=[sql_engine],
model=model,
)
# Run the agent with a prompt
response = agent.run("Can you give me the name of the client who got the most expensive receipt?")
print(response.content)
# Output: "The client with the most expensive receipt is Woodrow Wilson."
anebot
March 24, 2025, 2:50pm
3
Ok,
If running LLaMA 2 locally isn’t possible yet, are there any other models I can run local on my machine?
Thanks!
1 Like
Actually, it’s possible to do this "without directly loading” GGUF. The idea is to have Ollama load GGUF and run it as a server, and then have the smolagents access it. There seems to be an example on github below. I think there are probably other people who have written guides.
If you want to use GGUF, you might want to try setting up an Ollama server. Ollama itself is easier to use than Llama.cpp.
Also, as I mentioned above, there is a way to use TransformersModel . This is a function that literally loads the model before it is quantized locally. There are many types of models that can be used, but since they are before quantization, they are large in size. I think that support for quantization during loading is currently in progress.
This is a repo for a number of examples using the smolagents framework from Hugging Face.
1 Like