AutoTrain LLM on Spaces: OMP_NUM_THREADS ValueError: '3500m' despite explicit ENV var setting

aiwaah · August 2, 2025, 1:17pm

Hello Hugging Face Community and Support,

I am attempting to fine-tune the Qwen/Qwen2.5-3B-Instruct LLM using AutoTrain Advanced on a Hugging Face Space, with my custom dataset. My dataset is in the “Instruction, Input, Output” format, where the “Input” column is often empty.

I am consistently encountering a ValueError related to OMP_NUM_THREADS, which prevents the training from starting. The error message is:

ValueError: invalid literal for int() with base 10: ‘3500m’
val = int(os.environ.get(e, -1))
File “/app/env/lib/python3.10/site-packages/accelerate/utils/environment.py”, line 77, in get_int_from_env
args.num_cpu_threads_per_process = get_int_from_env([“OMP_NUM_THREADS”], 1)
Traceback (most recent call last):
libgomp: Invalid value for environment variable OMP_NUM_THREADS
libgomp: Invalid value for environment variable OMP_NUM_THREADS

Context and Steps Taken:

Model and Task: Qwen2.5-3B-Instruct, LLM Supervised Fine-Tuning (SFT).
Dataset: My dataset (aiwaah/yjDataset) is structured with instruction, input (often empty), and output columns. I have mapped these correctly in my training_params.json.
Hardware: The Space is running on a Tesla T4 GPU.
training_params.json (Custom JSON): I am using the following configuration. I initially had "distributed_backend": "ddp", but changed it to "distributed_backend": null to rule out multi-GPU configurations as a cause. The JSON is now syntactically valid.

JSON{
“auto_find_batch_size”: “false”,
“chat_template”: “none”,
“disable_gradient_checkpointing”: “false”,
“distributed_backend”: null,
“eval_strategy”: “epoch”,
“merge_adapter”: “false”,
“mixed_precision”: “fp16”,
“optimizer”: “adamw_torch”,
“peft”: “true”,
“padding”: “right”,
“quantization”: “int4”,
“scheduler”: “linear”,
“unsloth”: “false”,
“use_flash_attention_2”: “false”,
“batch_size”: “2”,
“block_size”: “1024”,
“epochs”: “3”,
“gradient_accumulation”: “4”,
“lr”: “0.00003”,
“logging_steps”: “-1”,
“lora_alpha”: “32”,
“lora_dropout”: “0.05”,
“lora_r”: “16”,
“max_grad_norm”: “1”,
“model_max_length”: “2048”,
“save_total_limit”: “1”,
“seed”: “42”,
“warmup_ratio”: “0.1”,
“weight_decay”: “0”,
“target_modules”: “all-linear”,
“instruction_column”: “instruction”,
“input_column”: “input”,
“output_column”: “output”
}

Attempted OMP_NUM_THREADS Override: Understanding that '3500m' looks like a CPU resource request, I explicitly added an environment variable OMP_NUM_THREADS with a value of 4 (and also tried 1) in the “Variables and secrets” section of my Hugging Face Space settings. (Please refer to the attached screenshots).

The Problem:

Despite setting OMP_NUM_THREADS in the Space variables, the accelerate library consistently receives '3500m' when it tries to parse this environment variable, leading to the ValueError. This suggests that my explicit environment variable setting is being overridden by an underlying system configuration within the Space’s environment.

Space URL: [Your Hugging Face Space URL here, e.g., https://huggingface.co/spaces/aiwaah/autotrain-advanced]

Any guidance or assistance in resolving this environment variable conflict would be greatly appreciated. Thank you!

miseyu · August 13, 2025, 6:29am

Hello huggingface support.

I ran into the same issue. When starting up T4 instance, I see those error logs.

Downloading autotrain_advanced-0.8.36-py3-none-any.whl (341 kB)
Installing collected packages: autotrain-advanced
Successfully installed autotrain-advanced-0.8.36

libgomp: Invalid value for environment variable OMP_NUM_THREADS

libgomp: Invalid value for environment variable OMP_NUM_THREADS

libgomp: Invalid value for environment variable OMP_NUM_THREADS

libgomp: Invalid value for environment variable OMP_NUM_THREADS
INFO     | 2025-08-13 06:15:30 | autotrain.app.ui_routes:<module>:31 - Starting AutoTrain...

And when I start training, same error happens and training is failed.

INFO     | 2025-08-13 06:16:49 | autotrain.commands:launch_command:515 - {'data_path': 'daekeun-ml/naver-news-summarization-ko', 'model': 'google-t5/t5-small', 'username': 'miseyu', 'seed': 42, 'train_split': 'train', 'valid_split': 'validation', 'project_name': 'autotrain-fyjjm-hsd86', 'token': '*****', 'push_to_hub': True, 'text_column': 'document', 'target_column': 'summary', 'lr': 5e-05, 'epochs': 3, 'max_seq_length': 128, 'max_target_length': 128, 'batch_size': 2, 'warmup_ratio': 0.1, 'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'logging_steps': -1, 'eval_strategy': 'epoch', 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'save_total_limit': 1, 'peft': False, 'quantization': 'int8', 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'target_modules': 'all-linear', 'log': 'tensorboard', 'early_stopping_patience': 5, 'early_stopping_threshold': 0.01}
INFO     | 2025-08-13 06:16:49 | autotrain.backends.local:create:25 - Training PID: 82
INFO:     10.16.9.222:39639 - "POST /ui/create_project HTTP/1.1" 200 OK

libgomp: Invalid value for environment variable OMP_NUM_THREADS
INFO:     10.16.9.8:2289 - "GET /ui/is_model_training HTTP/1.1" 200 OK

libgomp: Invalid value for environment variable OMP_NUM_THREADS
Traceback (most recent call last):
  File "/app/env/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/app/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1146, in launch_command
    args, defaults, mp_from_config_flag = _validate_launch_command(args)
  File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1122, in _validate_launch_command
    args.num_cpu_threads_per_process = get_int_from_env(["OMP_NUM_THREADS"], 1)
  File "/app/env/lib/python3.10/site-packages/accelerate/utils/environment.py", line 77, in get_int_from_env
    val = int(os.environ.get(e, -1))
ValueError: invalid literal for int() with base 10: '7500m'
INFO     | 2025-08-13 06:16:56 | autotrain.app.utils:get_running_jobs:40 - Killing PID: 82
INFO     | 2025-08-13 06:16:56 | autotrain.app.utils:kill_process_by_pid:90 - Sent SIGTERM to process with PID 82

I tried setting Env var to 4 or 6 etc, but it didn’t work.

And when I do the same on CPU only instances, it does not happens.

Thanks for any helps.

John6666 · August 13, 2025, 9:43am

If we could somehow specify num_cpu_threads_per_process args for accelerate or OMP_NUM_THREADS environment variable, we could definitely solve this problem…

bdf · August 19, 2025, 10:59pm

Having exactly the same problem a few days later. Quite frustrating – wanted to quickly check a LoRA use-case using something that didn’t involve the whole rigmarole and took me most of the afternoon to figure out there was just no way of making it work.

miseyu · August 20, 2025, 1:29am

Thank you @John6666 .

In my case the OMP_NUM_THREADS variable is set to ‘7500m’ which should be set to integer. Following the logs.

I could’t figure out why the value was set that value. Maybe docker image or OS?

I tried override it to integer value by setting the Environment Value on Space setting, but it didn’t work.

Plus I tried another instance type like L40s, it worked well.

Thank you.

eanemone · September 5, 2025, 10:47am

Has anyone discovered a solution yet?

I am having the exact same problem trying to fine-tune Hermes-4-14B in a Space running on Nvidia A10G Small.

John6666 · September 5, 2025, 1:56pm

This error appears to occur because accelerate is being executed with the OMP_NUM_THREADS environment variable set to a Kubernetes millicore-formatted CPU count (e.g., 3500m, 7500m).

To work around this, you could either modify the environment variable immediately before executing accelerate, set --num_cpu_threads_per_process, modify the accelerate library code to ignore the millicore notation, or have the HF side properly sanitize this environment variable.
Alternatively, find an environment where the OMP_NUM_THREADS environment variable is an integer.

Sanitizing the environment variable on the HF side seems preferable… @abhishek @hysts ?

hysts · September 6, 2025, 1:49am

Hi, I’m not quite sure why I was mentioned here – I’m not really familiar with AutoTrain.
If the issue is about the AutoTrain Dockerfile, I would recommend posting it directly on the official GitHub repository.

John6666 · September 6, 2025, 1:57am

Sorry. Thanks. I’m not really familiar with AutoTrain, too…
I called you because I couldn’t tell if this was an AutoTrain-specific issue or a general HF Spaces problem (though it usually doesn’t crash and is noticeable when using AutoTrain). It seems similar errors can occur in Spaces other than AutoTrain too.

libgomp: Invalid value for environment variable OMP_NUM_THREADS

hysts · September 6, 2025, 10:16am

Not sure if the error in that Space is directly related to the issue in this thread.

From the logs, the real problem is that the dependency libraries are not pinned.
If you set the versions below and upgrade gradio to the latest version, the Space can launch and run properly without throwing the environment variable error.

torch==1.13.1
torchvision==0.14.1
numpy<2

hysts · September 6, 2025, 10:23am

I’m not familiar with AutoTrain, so I might be wrong and can’t test it myself, but maybe you can try setting the environment variable before this line.

John6666 · September 6, 2025, 1:24pm

I think it can probably be avoided using the method suggested by hysts. If there’s hardware with an odd value stored in the OMP_NUM_THREADS environment variable, I could test it, but in the free CPU space, OMP_NUM_THREADS is set to 2, and in the Zero GPU Space, it’s set to 16. With these values, it should work fine whether fixed or not…

It seems to be an error that only reproduces in some of the PAYG space.

Topic		Replies	Views
Accelerate on 1 GPU 🤗Accelerate	2	1912	April 8, 2022
Struggle with training on TPU using 'accelerate' library 🤗Accelerate	3	1735	March 7, 2022
Use `accelerate` in SLURM environment 🤗Accelerate	9	3257	March 3, 2023
Run_ner.py slower on multi-GPU than single GPU Beginners	1	1810	September 23, 2020
Missing positional arguments when try to use multiple GPUs with accelerator 🤗Accelerate	4	2081	May 11, 2021

AutoTrain LLM on Spaces: OMP_NUM_THREADS ValueError: '3500m' despite explicit ENV var setting

Related topics