Smolagents Error: probability tensor contains either `inf`, `nan` or element < 0

n094t23g · April 4, 2025, 8:18am

I tried to migrate from HFApiModel to TransformerModel as I do not wish to incur more cost but I had this error. I am using ZeroGPU

  model = TransformersModel(
        # model_id="Qwen/Qwen2.5-Coder-14B-Instruct",
        model_id="meta-llama/Llama-3.2-3B-Instruct",
        device_map="cuda"
        ,max_new_tokens=5000,torch_dtype="bfloat16"
    )

I tried to solve it on my own via but the error persist. What else should I try?

John6666 · April 4, 2025, 9:30am

I think it’s the same kind of error as in the past, which can be avoided by quantization, and it’s interesting that it also occurs in float32.

I found a hypothesis that the cause may be a failure to tokenize a special token.

github.com/haotian-liu/LLaVA

[Usage] RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

opened 09:20AM - 09 Nov 23 UTC

RicRicci22

### Describe the issue Issue: I pulled last commits from the repo, tried to run… cli inference, the network is creating NaN as probability outputs. Command: ``` python -m llava.serve.cli --model-path liuhaotian/llava-v1.5-7b --image-file "https://llava-vl.github.io/static/images/view.jpg" --load-4bit ``` I also tried to create a fresh environment, still same bug. I explored the token in input to the network, and I noticed a strange -200 token. Not sure if this is causing the issue, maybe someone can have a look? I'm trying to debug it and come here if I have news! Log: ``` Traceback (most recent call last): File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/media/data/Riccardo/chat_with_OSM/LLaVA/llava/serve/cli.py", line 125, in <module> main(args) File "/media/data/Riccardo/chat_with_OSM/LLaVA/llava/serve/cli.py", line 95, in main output_ids = model.generate( File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate return self.sample( File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2678, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 ``` ![image](https://github.com/haotian-liu/LLaVA/assets/44427504/18b080df-a642-4782-8839-6dfc0a06b764)

github.com/meta-llama/llama

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

opened 12:38AM - 19 Jul 23 UTC

Liyan06

model-usage

```python from transformers import AutoTokenizer, AutoModelForCausalLM token…izer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf") model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf") inputs = ... inputs = tokenizer.batch_encode_plus(inputs, return_tensors="pt", padding=True) model.generate(**inputs, **generate_kwargs) ``` RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 I got this error while doing inference for text generation, in particular when the batch size is great than 1. I did not get this error and generate correctly when the batch size is set to 1. Does anyone see the same issue?

n094t23g · April 4, 2025, 9:57am

interesting, now I tried unsloth llama 3.2 bnb 4bits and it threw another error:

RuntimeError: All input tensors need to be on the same GPU, but found some tensors to not be on a GPU:
[(torch.Size([1, 4718592]), device(type=‘cpu’)), (torch.Size([147456]), device(type=‘cpu’)), (torch.Size([3072, 3072]), device(type=‘cpu’))]

not sure how to change the input tensor into gpu. I am using stream_to_gradio() to send the new message to the agent.

I do thank you for your help.

edit:
not sure if this is the right path by doing

   model = TransformersModel(
        # model_id="Qwen/Qwen2.5-Coder-14B-Instruct",
        model_id="unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit",
        device_map="cuda"
    )
    model.model = model.model.to("cuda")

or

   model = TransformersModel(
        # model_id="Qwen/Qwen2.5-Coder-14B-Instruct",
        model_id="unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit",
        device_map="cuda"
    )
model.model.to("cuda")

edit: neither worked

John6666 · April 4, 2025, 10:30am

It’s been properly .to(model.device) ed…
I wonder if this is another bug.

github.com/huggingface/smolagents

src/smolagents/models.py

main


      
              )
          else:
              prompt_tensor = self.tokenizer.apply_chat_template(
                  messages,
                  tools=[get_tool_json_schema(tool) for tool in tools_to_call_from] if tools_to_call_from else None,
                  return_tensors="pt",
                  return_dict=True,
                  add_generation_prompt=True if tools_to_call_from else False,
              )
          
          prompt_tensor = prompt_tensor.to(self.model.device)
          count_prompt_tokens = prompt_tensor["input_ids"].shape[1]
          
          if stop_sequences:
              stopping_criteria = self.make_stopping_criteria(
                  stop_sequences, tokenizer=self.processor if hasattr(self, "processor") else self.tokenizer
              )
          else:
              stopping_criteria = None
          
          out = self.model.generate(

How about like this?

   model = TransformersModel(
        # model_id="Qwen/Qwen2.5-Coder-14B-Instruct",
        model_id="unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit",
        device_map="cuda"
    )
print(model.model.device)

n094t23g · April 4, 2025, 10:47am

Thanks for the help.

Here is the bug report that i created

John6666 · April 4, 2025, 10:47am

Good job!

Topic		Replies	Views
AutoModelforCausalLM fails only on Cuda due to inf/nan/<0 tensors 🤗Transformers	4	180	April 8, 2025
Runtime error when using device_map 🤗Transformers	1	1174	September 20, 2023
[Blenderbot] Getting runtime error while using generate 🤗Transformers	3	3761	August 8, 2023
Problem in Inference on "meta-llama/Meta-Llama-3.1-70B" Beginners	3	415	September 16, 2024
Error on later checkpoint when doing generation using TextGenerationPipeline 🤗Transformers	1	929	August 8, 2023

Smolagents Error: probability tensor contains either `inf`, `nan` or element < 0

Related topics