SmolAgents: Try to run Agent with local model (mistral)

anebot · March 20, 2025, 2:52pm

I have the following code, but when I run it, I get the following error:

“Error in generating model output: local_llama_response() got an unexpected keyword argument ‘stop_sequences’”

I assume that I need to format the prompt in a specific way for the LLaMA model, but I’m not sure how to do it. Could you please assist me?

I’ve also tried passing the model directly to the CodeAgent constructor, but it didn’t work.
Code :

# Load the LLaMA model
llm = Llama(model_path=f"../models/mistral-7b-instruct-v0.1.Q4_K_M.gguf", n_ctx=2048)

# Define a wrapper function to avoid 'stop_sequences' error
def local_llama_response(prompt):
    """Generate a response using the local LLaMA model."""
    response = llm(prompt, max_tokens=100, stop=["\n"])  # Stop passed correctly
    return response["choices"][0]["text"]

agent = CodeAgent(tools=[get_capture_channels_recording_paths,get_recording_server], model=local_llama_response)

while True:
    user_input = input("Type something (or 'bye' to exit): ").strip().lower()
    if user_input == "bye":
        print("Goodbye!")
        break

    #print(f"You said: {user_input}")
    agent.run(user_input)

Thanks!

John6666 · March 20, 2025, 3:21pm

It’s difficult to link it with Llama.cpp because it hasn’t been officially supported by the smolagents side yet.

Regardless, even TransformersModel may not work if max_tokens is small, so that may be the case.

github.com/huggingface/smolagents

[BUG] Running with `TransformersModel` does not work

opened 11:48AM - 29 Jan 25 UTC

closed 05:35PM - 13 Feb 25 UTC

danielkorat

bug

**Describe the bug** When replacing `HfApiModel` with `TransformersModel` in `ex…amples/benchmark.ipynb`, the eval results for `meta-llama/Llama-3.1-8B-Instruct` (and various other published models) are far worse than published (scores of less than 5). **Code to reproduce the error** https://github.com/danielkorat/smolagents/blob/transformers/examples/benchmark-transformers.ipynb **Error logs (if any)** Seems like a big part of problem is the parsing of the LLM output (specifically the assistant role): ![Image](https://github.com/user-attachments/assets/689b600f-6c62-493c-8148-531d52c162f9) Also, the regex parsing error arises in nearly all examples. **Expected behavior** Trying to reproduce the results for `meta-llama/Llama-3.1-8B-Instruct`, as published in the original notebook: ![Image](https://github.com/user-attachments/assets/110dceb8-0383-47cd-ad4e-bf278ccf5362) **Packages version:** ```python >>> smolagents.__version__ '1.5.0.dev' ``` **Additional context** Add any other context about the problem here. ```bash accelerate==1.3.0 datasets==3.1.0 matplotlib==3.10.0 matplotlib-inline==0.1.7 numpy==1.26.4 seaborn==0.13.2 sentence-transformers==3.3.0 sympy==1.13.1 transformers==4.48.1 ```

github.com/huggingface/smolagents

Add Llama.cpp model capability

main ← ryantzr1:llama-cpp-models

opened 12:54PM - 31 Jan 25 UTC

ryantzr1

+184 -10

Closes #449 ## Description This pull request introduces the `LlamaCppModel…` class to the **smolagents** library, enabling seamless integration with `llama.cpp` models. This enhancement expands the library's versatility, allowing users to leverage optimized performance and efficiency offered by `llama.cpp` for running large language models. ### **Features Added:** - **LlamaCppModel Class:** A new class to interact with `llama.cpp` models, supporting both local model loading and Hugging Face repository integration. - **Parameter Handling:** Comprehensive parameter management, including GPU layers, context size, and maximum token generation. - **Conditional Tool Integration:** Tools are integrated only when explicitly provided, ensuring optimized resource utilization. ### **Motivation and Context** Integrating `llama.cpp` models into **smolagents** addresses the growing demand for efficient and resource-optimized language model interactions. This addition allows users to benefit from `llama.cpp`'s capabilities while maintaining the flexibility and functionality that **smolagents** offers. ### **How Has This Been Tested?** - **Integration Tests:** Tested the `LlamaCppModel` within a `CodeAgent` to ensure seamless interaction and tool usage. ### **Example Usage from text_to_sql.py:** ```python from smolagents import LlamaCppModel, CodeAgent from smolagents.tools import SQLTool # Assume SQLTool is predefined # Initialize the SQL tool sql_engine = SQLTool(...) # Initialize the LlamaCppModel model = LlamaCppModel( repo_id="bartowski/Qwen2.5-7B-Instruct-1M-GGUF", filename="Qwen2.5-7B-Instruct-1M-IQ2_M.gguf", n_ctx=8192, max_tokens=8192, ) # Create the CodeAgent with the SQL tool and LlamaCppModel agent = CodeAgent( tools=[sql_engine], model=model, ) # Run the agent with a prompt response = agent.run("Can you give me the name of the client who got the most expensive receipt?") print(response.content) # Output: "The client with the most expensive receipt is Woodrow Wilson."

anebot · March 24, 2025, 2:50pm

Ok,

If running LLaMA 2 locally isn’t possible yet, are there any other models I can run local on my machine?

Thanks!

John6666 · March 24, 2025, 2:59pm

Actually, it’s possible to do this "without directly loading” GGUF. The idea is to have Ollama load GGUF and run it as a server, and then have the smolagents access it. There seems to be an example on github below. I think there are probably other people who have written guides.

If you want to use GGUF, you might want to try setting up an Ollama server. Ollama itself is easier to use than Llama.cpp.

Also, as I mentioned above, there is a way to use TransformersModel. This is a function that literally loads the model before it is quantized locally. There are many types of models that can be used, but since they are before quantization, they are large in size. I think that support for quantization during loading is currently in progress.

Topic		Replies	Views
Qlora adapter with smolagents Beginners	1	46	February 9, 2025
Avoiding the usage of HfApiModel and using local model - `smolagents` Beginners	7	626	May 2, 2025
How to run agents from `smolagents` locally? Inference Endpoints on the Hub	4	527	May 27, 2025
sentence-transformers/all-MiniLM-L6-v2 Not working all of a sudden Beginners	9	152	May 8, 2025
Error in generating model output: InferenceClient.chat_completion() got an unexpected keyword argument 'last_input_token_count' Models	2	67	June 10, 2025

SmolAgents: Try to run Agent with local model (mistral)

Related topics