Hi everyone, I’m new to LLM. I’m trying to run a HugginFace Space to train a model. In “Hardware” drop-down menu, I have chosen “Local”. I’m getting the “No GPU found” error. Here’s the error log:
> INFO Starting local training...
> INFO {"model":"tiiuae/falcon-rw-1b","project_name":"autotrain-test-1","data_path":"autotrain-test-1/autotrain-data","train_split":"train","valid_split":null,"add_eos_token":true,"block_size":1024,"model_max_length":2048,"padding":"right","trainer":"sft","use_flash_attention_2":false,"log":"tensorboard","disable_gradient_checkpointing":false,"logging_steps":-1,"evaluation_strategy":"epoch","save_total_limit":1,"save_strategy":"epoch","auto_find_batch_size":false,"mixed_precision":"fp16","lr":0.00003,"epochs":3,"batch_size":2,"warmup_ratio":0.1,"gradient_accumulation":1,"optimizer":"adamw_torch","scheduler":"linear","weight_decay":0.0,"max_grad_norm":1.0,"seed":42,"apply_chat_template":false,"quantization":"int4","target_modules":"","merge_adapter":false,"peft":true,"lora_r":16,"lora_alpha":32,"lora_dropout":0.05,"model_ref":null,"dpo_beta":0.1,"prompt_text_column":"autotrain_prompt","text_column":"autotrain_text","rejected_text_column":"autotrain_rejected_text","push_to_hub":true,"repo_id":"aidbar/autotrain-test-1","username":"aidbar","token":"hf_**********************************"}
> WARNING No GPU found. Forcing training on CPU. This will be super slow!
> INFO ['accelerate', 'launch', '--cpu', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-test-1/training_params.json']
> INFO Training PID: 69
INFO: 10.16.41.118:39290 - "POST /create_project HTTP/1.1" 200 OK
INFO: 10.16.18.44:54445 - "GET /accelerators HTTP/1.1" 200 OK
> INFO Running jobs: [69]
INFO: 10.16.18.44:9047 - "GET /is_model_training HTTP/1.1" 200 OK
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `0`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
> INFO Running jobs: [69]
INFO: 10.16.6.1:5462 - "GET /is_model_training HTTP/1.1" 200 OK
[2024-02-14 21:17:27,044] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/app/env/lib/python3.10/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The `optimize_cuda_cache` arguement will be deprecated soon, please use `optimize_device_cache` instead.
warnings.warn(
🚀 INFO | 2024-02-14 21:17:27 | __main__:process_input_data:41 - loading dataset from disk
🚀 INFO | 2024-02-14 21:17:27 | __main__:process_input_data:82 - Train data: Dataset({
features: ['instruction', 'input', 'output', 'autotrain_text'],
num_rows: 52002
})
🚀 INFO | 2024-02-14 21:17:27 | __main__:process_input_data:83 - Valid data: None
tokenizer_config.json: 0%| | 0.00/234 [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████| 234/234 [00:00<00:00, 1.29MB/s]
vocab.json: 0%| | 0.00/798k [00:00<?, ?B/s]
vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 99.0MB/s]
merges.txt: 0%| | 0.00/456k [00:00<?, ?B/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 76.7MB/s]
tokenizer.json: 0%| | 0.00/2.11M [00:00<?, ?B/s]
tokenizer.json: 100%|██████████| 2.11M/2.11M [00:00<00:00, 40.3MB/s]
special_tokens_map.json: 0%| | 0.00/99.0 [00:00<?, ?B/s]
special_tokens_map.json: 100%|██████████| 99.0/99.0 [00:00<00:00, 569kB/s]
config.json: 0%| | 0.00/1.05k [00:00<?, ?B/s]
config.json: 100%|██████████| 1.05k/1.05k [00:00<00:00, 6.36MB/s]
configuration_falcon.py: 0%| | 0.00/6.70k [00:00<?, ?B/s]
configuration_falcon.py: 100%|██████████| 6.70k/6.70k [00:00<00:00, 31.1MB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-rw-1b:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_falcon.py: 0%| | 0.00/56.9k [00:00<?, ?B/s]
modeling_falcon.py: 100%|██████████| 56.9k/56.9k [00:00<00:00, 153MB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-rw-1b:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
❌ ERROR | 2024-02-14 21:17:28 | autotrain.trainers.common:wrapper:91 - train has failed due to an exception: Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 88, in wrapper
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/__main__.py", line 186, in train
model = AutoModelForCausalLM.from_pretrained(
File "/app/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
File "/app/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3032, in from_pretrained
raise RuntimeError("No GPU found. A GPU is needed for quantization.")
RuntimeError: No GPU found. A GPU is needed for quantization.
How can I fix this?
Thank you in advance