Ai Agents course error in running the Smolagent example

abenari0 · March 17, 2025, 4:41am

Hello, following unit 1, in the AI agent course, running the given application built in smolagent below. The question is time in EST now?

Any ideas? what am I missing here.

Final answer:
Error in generating final LLM output:
422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/Qwen/Qwen2.5-Coder-32B-Instruct/v1/chat/completions (Request ID: Zji2Dy1jhRHQ_jjXJLqQx)

Input validation error: inputs tokens + max_new_tokens must be <= 32768. Given: 89942 inputs tokens and 2096 max_new_tokens
Make sure ‘text-generation’ task is supported by the model.

John6666 · March 17, 2025, 5:43am

Input validation error: inputs tokens + max_new_tokens must be <= 32768. Given: 89942 inputs tokens and 2096 max_new_tokens

The error message says that the input tokens are too large and exceed 32768 (-2096), so it can’t be processed, but if this is the result of following the course, then it’s a bug in the course in a broad sense…
If possible, why not try changing the model?
Or raising issue?

PriyankarSingha · April 1, 2025, 7:17am

I am having issues while running the agents using HfApiModel(). Following is the syntax and the error:
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel

Initialize the search tool

search_tool = DuckDuckGoSearchTool()

Initialize the model

model = HfApiModel(hf_model)

agent = CodeAgent(
model = model,
tools=[search_tool]
)

Example usage

response = agent.run(
“Search for luxury superhero-themed party ideas, including decorations, entertainment, and catering.”
)
print(response)

Error in generating model output:
402 Client Error: Payment Required for url:
https://router.huggingface.co/hf-inference/models/Qwen/QwQ-32B/v1/chat/completions (Request ID:
Root=1-67eb90ed-4dceeaaf55ae88031ef4b296;dc2390e1-cf09-4f27-b530-c55bff8d2db7)

You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly
included credits.

I know that it is asking for upgrading HF subscription, but is there any way I can use it for free? I am a beginner at HF

John6666 · April 1, 2025, 8:17am

That’s just a simple case of exceeding the usage limit, so it’s a bit difficult, but I think you could get around it by using a local model or using another company’s API.

sabler · April 1, 2025, 9:02pm

The only way I got around this was upgrading my HF account. I know that’s not the ideal solution, and not something that should be the easiest path forward, but I guess you can downgrade your account after the course is completed.

nskaidni · April 4, 2025, 12:07pm

Yes, most of the errors disappeared once I upgraded my account. Looks like a capacity issue ( Token counts aren’t being properly returned from the model API).

Although, I did make some changes to the Gradio_UI.py file to resolve some Type issues - similar to the below

Before

total_input_tokens += agent.model.last_input_token_count

After (add safety check)

token_count = agent.model.last_input_token_count or 0
total_input_tokens += token_count

Original line (line 115)

f" | Input-tokens:{step_log.input_token_count:,} | Output-tokens:{step_log.output_token_count:,}"

Fixed version - add None handling

f" | Input-tokens:{step_log.input_token_count or 0:,} | Output-tokens:{step_log.output_token_count or 0:,}"

neomit · April 24, 2025, 5:24pm

"Error in generating model output:
401 Client Error: Unauthorized for url: https://router.huggingface.co/hf-inference/models/Qwen/Qwen2.5-Coder-32B-Instruct/v1/chat/completions (Request ID: Root=1-680a7e9b-5c2059aa6ca44bed59bec0bc;6c8ff9c7-6358-4d3f-bf84-f596010c2321)

Invalid username or password. Error generating answer"

How to fix this error ?

neomit · April 24, 2025, 5:27pm

updated the requirements file with smolagents

still same error

neomit · April 24, 2025, 6:28pm

Fixed the previous problem,
Now

"Error in generating model output:
401 Client Error: Unauthorized for url: https://router.huggingface.co/hf-inference/models/Qwen/Qwen2.5-Coder-32B-Instruct/v1/chat/completions (Request ID: Root=1-680a7e9b-5c2059aa6ca44bed59bec0bc;6c8ff9c7-6358-4d3f-bf84-f596010c2321)

Invalid username or password."

How to fix this error ?

John6666 · April 25, 2025, 4:09am

Invalid username or password."

Based on this error message, it is possible that the token content is incorrect or that the token you are passing is different from the token that is actually being passed.

The most reliable method is to use login(), but in many cases, the issue can be resolved by configuring the HF_TOKEN environment variable (Secrets).

github.com/huggingface/huggingface_hub

401 Client Error - Issue with logging with Huggingface CLI/code

opened 09:23PM - 01 Oct 24 UTC

closed 08:20AM - 15 Oct 24 UTC

NamburiSrinath

bug

### Describe the bug I got access to Llama-3.2 but when I tried to access the… model, I got the 401 Error. ``` OSError: You are trying to access a gated repo. Make sure to have access to it at https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct. 401 Client Error. (Request ID: Root=1-66fc667a-79545df845c9253539ea509f;f6f84d32-d328-45e5-b78d-96461057d7d0) Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/resolve/main/config.json. Access to model meta-llama/Llama-3.2-1B-Instruct is restricted. You must have access to it and be authenticated to access it. Please log in. ``` Now this means either I don't have access to 3.2B or I am not logged in. I got approved access, so it's not the first one (more about this in Reproduction). I added my HF token key using `huggingface-cli login`, but I am still facing the same issue after that. I tried to debug and found that the command `huggingface-cli whoami` gives me this error - `Invalid user token. If you didn't pass a user token, make sure you are properly logged in by executing huggingface-cli login, and if you did pass a user token, double-check it's correct. {"error":"Invalid username or password."}` So, basically even if I login (either via CLI or in code), I am getting this error ``` from huggingface_hub import login login(token = "xxx") ``` **Note:** The only change I made (from yesterday when Llama-2 is working to today) is to change some environment variables i.e ``` export HF_HOME=xxx export HF_ASSETS_CACHE=xxx export XDG_CACHE_HOME=xxx export HF_TOKEN=xxx export HF_HUB_CACHE=xxx ``` ### Reproduction I also experimented with `Llama-2-7B` (which I used till yesterday and it was fine) but was getting the same `GatedError`. ``` from transformers import AutoTokenizer import transformers import torch from huggingface_hub import login login(token = "xxx") model = "meta-llama/Llama-2-7b-chat-hf" tokenizer = AutoTokenizer.from_pretrained(model) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) sequences = pipeline( 'I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?\n', do_sample=True, top_k=10, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id, max_length=200, ) for seq in sequences: print(f"Result: {seq['generated_text']}") ``` ### Logs ``` The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well. Token is valid (permission: fineGrained). Your token has been saved to /qumulo/satya/huggingface/token Login successful Traceback (most recent call last): File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status response.raise_for_status() File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/transformers/utils/hub.py", line 403, in cached_file resolved_file = hf_hub_download( File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f return f(*args, **kwargs) File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1232, in hf_hub_download return _hf_hub_download_to_cache_dir( File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1339, in _hf_hub_download_to_cache_dir _raise_on_head_call_error(head_call_error, force_download, local_files_only) File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1854, in _raise_on_head_call_error raise head_call_error File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1746, in _get_metadata_or_catch_error metadata = get_hf_file_metadata( File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1666, in get_hf_file_metadata r = _request_wrapper( File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 364, in _request_wrapper response = _request_wrapper( File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 388, in _request_wrapper hf_raise_for_status(response) File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 423, in hf_raise_for_status raise _format(GatedRepoError, message, response) from e huggingface_hub.errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66fc6aa2-51edf73166f9906f2c6e4de2;44b58636-df14-4a1a-bb9c-4d83423f68ed) Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json. Access to model meta-llama/Llama-2-7b-chat-hf is restricted. You must have access to it and be authenticated to access it. Please log in. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/qumulo/satya/Compress_Align/scratch_code/pytorch_test.py", line 15, in <module> tokenizer = AutoTokenizer.from_pretrained(model) File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 864, in from_pretrained config = AutoConfig.from_pretrained( File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 1006, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/transformers/configuration_utils.py", line 567, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/transformers/configuration_utils.py", line 626, in _get_config_dict resolved_config_file = cached_file( File "/qumulo/satya/anaconda3/envs/prune_llm/lib/python3.9/site-packages/transformers/utils/hub.py", line 421, in cached_file raise EnvironmentError( OSError: You are trying to access a gated repo. Make sure to have access to it at https://huggingface.co/meta-llama/Llama-2-7b-chat-hf. 401 Client Error. (Request ID: Root=1-66fc6aa2-51edf73166f9906f2c6e4de2;44b58636-df14-4a1a-bb9c-4d83423f68ed) Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json. Access to model meta-llama/Llama-2-7b-chat-hf is restricted. You must have access to it and be authenticated to access it. Please log in. ``` ### System info ```shell - huggingface_hub version: 0.25.1 - Platform: Linux-5.15.0-1065-nvidia-x86_64-with-glibc2.35 - Python version: 3.9.19 - Running in iPython ?: No - Running in notebook ?: No - Running in Google Colab ?: No - Running in Google Colab Enterprise ?: No - Token path ?: /qumulo/satya/huggingface/token - Has saved token ?: True - Configured git credential helpers: - FastAI: N/A - Tensorflow: N/A - Torch: 2.4.1 - Jinja2: 3.1.4 - Graphviz: N/A - keras: N/A - Pydot: N/A - Pillow: N/A - hf_transfer: N/A - gradio: N/A - tensorboard: N/A - numpy: 1.26.4 - pydantic: 2.9.2 - aiohttp: 3.10.8 - ENDPOINT: https://huggingface.co - HF_HUB_CACHE: /qumulo/satya/huggingface - HF_ASSETS_CACHE: /qumulo/satya/huggingface - HF_TOKEN_PATH: /qumulo/satya/huggingface/token - HF_HUB_OFFLINE: False - HF_HUB_DISABLE_TELEMETRY: False - HF_HUB_DISABLE_PROGRESS_BARS: None - HF_HUB_DISABLE_SYMLINKS_WARNING: False - HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False - HF_HUB_DISABLE_IMPLICIT_TOKEN: False - HF_HUB_ENABLE_HF_TRANSFER: False - HF_HUB_ETAG_TIMEOUT: 10 - HF_HUB_DOWNLOAD_TIMEOUT: 10 ```

Aditya327 · June 1, 2025, 1:56am

Hello @sabler I have paid for the Pro, still I am getting the error. Created a new token as well. What Permissions did you end setting for the token so that it could work

John6666 · June 1, 2025, 4:01am

If you are unsure about token permissions, using a READ token makes it easy. Also, it seems that the key operation when copying and pasting may be the cause.

sabler · June 1, 2025, 7:13pm

The token I’m using is set to WRITE, but like @John6666 said, you should be able to use READ just fine.

Aditya327 · June 2, 2025, 4:36pm

Hello @John6666 and @sabler
I have given write permission for the token, tried to recreate the token twice, but was not able access the model. Still getting 402 error.

John6666 · June 2, 2025, 11:53pm

401 errors are often caused by mistakes or malfunctions, but 402 errors are usually caused by exceeding the monthly usage limit.

Topic		Replies	Views
Invalid credentials in Authorization header - HfApiModel Course	4	1648	March 24, 2025
Error in generating model output: InferenceClient.chat_completion() got an unexpected keyword argument 'last_input_token_count' Models	2	65	June 10, 2025
Agent wont respond Beginners	6	347	April 26, 2025
HfHubHTTPError: 503 server error Beginners	1	401	April 17, 2025
Error in tutorial Text-to-SQL of smolagents Course	3	57	February 17, 2025