Hello Hugging Face Community and Support,
I am attempting to fine-tune the Qwen/Qwen2.5-3B-Instruct LLM using AutoTrain Advanced on a Hugging Face Space, with my custom dataset. My dataset is in the âInstruction, Input, Outputâ format, where the âInputâ column is often empty.
I am consistently encountering a ValueError related to OMP_NUM_THREADS, which prevents the training from starting. The error message is:
ValueError: invalid literal for int() with base 10: â3500mâ
val = int(os.environ.get(e, -1))
File â/app/env/lib/python3.10/site-packages/accelerate/utils/environment.pyâ, line 77, in get_int_from_env
args.num_cpu_threads_per_process = get_int_from_env([âOMP_NUM_THREADSâ], 1)
Traceback (most recent call last):
libgomp: Invalid value for environment variable OMP_NUM_THREADS
libgomp: Invalid value for environment variable OMP_NUM_THREADS
Context and Steps Taken:
- Model and Task: Qwen2.5-3B-Instruct, LLM Supervised Fine-Tuning (SFT).
- Dataset: My dataset (
aiwaah/yjDataset) is structured withinstruction,input(often empty), andoutputcolumns. I have mapped these correctly in mytraining_params.json. - Hardware: The Space is running on a Tesla T4 GPU.
training_params.json(Custom JSON): I am using the following configuration. I initially had"distributed_backend": "ddp", but changed it to"distributed_backend": nullto rule out multi-GPU configurations as a cause. The JSON is now syntactically valid.
JSON{
âauto_find_batch_sizeâ: âfalseâ,
âchat_templateâ: ânoneâ,
âdisable_gradient_checkpointingâ: âfalseâ,
âdistributed_backendâ: null,
âeval_strategyâ: âepochâ,
âmerge_adapterâ: âfalseâ,
âmixed_precisionâ: âfp16â,
âoptimizerâ: âadamw_torchâ,
âpeftâ: âtrueâ,
âpaddingâ: ârightâ,
âquantizationâ: âint4â,
âschedulerâ: âlinearâ,
âunslothâ: âfalseâ,
âuse_flash_attention_2â: âfalseâ,
âbatch_sizeâ: â2â,
âblock_sizeâ: â1024â,
âepochsâ: â3â,
âgradient_accumulationâ: â4â,
âlrâ: â0.00003â,
âlogging_stepsâ: â-1â,
âlora_alphaâ: â32â,
âlora_dropoutâ: â0.05â,
âlora_râ: â16â,
âmax_grad_normâ: â1â,
âmodel_max_lengthâ: â2048â,
âsave_total_limitâ: â1â,
âseedâ: â42â,
âwarmup_ratioâ: â0.1â,
âweight_decayâ: â0â,
âtarget_modulesâ: âall-linearâ,
âinstruction_columnâ: âinstructionâ,
âinput_columnâ: âinputâ,
âoutput_columnâ: âoutputâ
}
- Attempted
OMP_NUM_THREADSOverride: Understanding that'3500m'looks like a CPU resource request, I explicitly added an environment variableOMP_NUM_THREADSwith a value of4(and also tried1) in the âVariables and secretsâ section of my Hugging Face Space settings. (Please refer to the attached screenshots).
The Problem:
Despite setting OMP_NUM_THREADS in the Space variables, the accelerate library consistently receives '3500m' when it tries to parse this environment variable, leading to the ValueError. This suggests that my explicit environment variable setting is being overridden by an underlying system configuration within the Spaceâs environment.
Space URL: [Your Hugging Face Space URL here, e.g., https://huggingface.co/spaces/aiwaah/autotrain-advanced]
Any guidance or assistance in resolving this environment variable conflict would be greatly appreciated. Thank you!