Hi, I find myself in the same situation where I’ve been struggling with the same issue for the past three days…
Here is a screenshot of my space with the proper settings to train the model and the loaded dataset:
And here are the logs I receive once I create the project:
INFO task_type: LLM Finetuning
INFO model_choice: HuggingFace Hub
INFO Updating hub model choices for task: lm_training, model_choice: HuggingFace Hub
Traceback (most recent call last):
File “/app/env/lib/python3.9/site-packages/gradio/routes.py”, line 442, in run_predict
output = await app.get_blocks().process_api(
File “/app/env/lib/python3.9/site-packages/gradio/blocks.py”, line 1392, in process_api
result = await self.call_function(
File “/app/env/lib/python3.9/site-packages/gradio/blocks.py”, line 1097, in call_function
prediction = await anyio.to_thread.run_sync(
File “/app/env/lib/python3.9/site-packages/anyio/to_thread.py”, line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File “/app/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py”, line 877, in run_sync_in_worker_thread
return await future
File “/app/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py”, line 807, in run
result = context.run(func, *args)
File “/app/env/lib/python3.9/site-packages/gradio/utils.py”, line 703, in wrapper
response = f(*args, **kwargs)
File “/app/src/autotrain/app.py”, line 170, in _update_col_map
data_cols = pd.read_csv(training_data[0].name, nrows=2).columns.tolist()
TypeError: ‘NoneType’ object is not subscriptable
INFO Estimating costs…
INFO model_choice: HuggingFace Hub
INFO Updating hub model choices for task: lm_training, model_choice: HuggingFace Hub
INFO Estimating costs…
INFO Estimating costs…
INFO Estimating costs…
INFO Estimating number of samples
INFO Estimating costs for: num_models: 3, task: lm_training, num_samples: 3996
INFO Getting project cost…
INFO Sending GET request to https://api.autotrain.huggingface.co/pricing/compute?username=JordanLaforet&task_id=9&num_samples=3996&num_models=3
INFO Estimated_cost: 0
INFO Creating project: h1fq-m1lc-mxul
INFO Task: lm_training
INFO Training data: [<tempfile._TemporaryFileWrapper object at 0x7f6e8e13c850>]
INFO Validation data: None
INFO Training params: [{“hub_model”: “meta-llama/Llama-2-7b-hf”, “num_models”: 3}]
INFO Hub model: meta-llama/Llama-2-7b-hf
INFO Estimated cost: 0.0
INFO :Can pay: False
INFO Dataset: h1fq-m1lc-mxul (lm_training)
Train data: [‘/tmp/gradio/552a0fe9ac975ff47df0e51f77a96d2ef40f8e6a/Dataset.csv’]
Valid data:
Column mapping: {‘text’: ‘training_data’}
Pushing dataset shards to the dataset hub: 0%| | 0/1 [00:00<?, ?it/s]
Creating parquet from Arrow format: 0%| | 0/4 [00:00<?, ?ba/s]
Creating parquet from Arrow format: 100%|██████████| 4/4 [00:00<00:00, 656.93ba/s]
Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:00<00:00, 2.97it/s]
Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:00<00:00, 2.97it/s]
Pushing dataset shards to the dataset hub: 0%| | 0/1 [00:00<?, ?it/s]
Creating parquet from Arrow format: 0%| | 0/1 [00:00<?, ?ba/s]
Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 712.95ba/s]
Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:00<00:00, 5.17it/s]
Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:00<00:00, 5.17it/s]
Downloading metadata: 0%| | 0.00/466 [00:00<?, ?B/s]
Downloading metadata: 100%|██████████| 466/466 [00:00<00:00, 7.73MB/s]
INFO Creating project h1fq-m1lc-mxul, task: lm_training
INFO Using username: JordanLaforet
INFO Using param_choice: autotrain
INFO Using hub_model: meta-llama/Llama-2-7b-hf
INFO Using job_params: [{‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘num_models’: 3, ‘task’: ‘lm_training’}]
INFO Creating project h1fq-m1lc-mxul, task: lm_training
INFO Creatin:rocket:g project with payload: {‘username’: ‘JordanLaforet’, ‘proj_name’: ‘h1fq-m1lc-mxul’, ‘task’: 9, ‘config’: {‘advanced’: True, ‘autotrain’: True, ‘language’: ‘unk’, ‘max_models’: 3, ‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘params’: [{‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘task’: ‘lm_training’}]}}
INFO Creating project with payload: {‘username’: ‘JordanLaforet’, ‘proj_name’: ‘h1fq-m1lc-mxul’, ‘task’: 9, ‘config’: {‘advanced’: True, ‘autotrain’: True, ‘language’: ‘unk’, ‘max_models’: 3, ‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘params’: [{‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘task’: ‘lm_training’}]}}
INFO Sending POST request to https://api.autotrain.huggingface.co/projects/create
INFO Sending POST request to https://api.autotrain.huggingface.co/projects/81027/data/start_processing
INFO Waiting for data processing to complete …
INFO Sending GET request to https://api.autotrain.huggingface.co/projects/81027
INFO Sending GET request to https://api.autotrain.huggingface.co/projects/81027
INFO Data processing complete!
INFO Approving project # 81027
INFO Sending POST request to https://api.autotrain.huggingface.co/projects/81027/start_training
Traceback (most recent call last):
File “/app/env/lib/python3.9/site-packages/gradio/routes.py”, line 442, in run_predict
output = await app.get_blocks().process_api(
File “/app/env/lib/python3.9/site-packages/gradio/blocks.py”, line 1392, in process_api
result = await self.call_function(
File “/app/env/lib/python3.9/site-packages/gradio/blocks.py”, line 1097, in call_function
prediction = await anyio.to_thread.run_sync(
File “/app/env/lib/python3.9/site-packages/anyio/to_thread.py”, line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File “/app/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py”, line 877, in run_sync_in_worker_thread
return await future
File “/app/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py”, line 807, in run
result = context.run(func, *args)
File “/app/env/lib/python3.9/site-packages/gradio/utils.py”, line 703, in wrapper
response = f(*args, **kwargs)
File “/app/src/autotrain/app.py”, line 503, in _create_project
project.approve(project_id)
File “/app/src/autotrain/project.py”, line 201, in approve
_ = http_post(
File “/app/src/autotrain/utils.py”, line 94, in http_post
response.raise_for_status()
File “/app/env/lib/python3.9/site-packages/requests/models.py”, line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api.autotrain.huggingface.co/projects/81027/start_training
It appears that the ‘/start_processing’ step runs fine, but the ‘/start_training’ step encounters an issue when making a request to the API. As I’m relatively new to Hugging Face, I’m unsure about the nature of the problem. Any assistance you could provide would be greatly appreciated!