Days failing to do a basic test with AutoTrain, please help

Hi all,

First of all, thank you very much for taking your time to read and answer my question.

I’m new in NLP, I just heard about HuggingFace a couple of weeks ago.

Since then, I was struggling from morning to night to execute a simple test to see how Auto Train (and Autotrain advanced) work.

I’m writing this after literally breaking my head with different attempts and I have a question:

Is it possible the reason I didn’t manage to execute the LLM fine tuning task is that I didn’t add a payment method?

When I try to use AutoTrain, after selecting a dummy csv with text and target, with 2 columns and 4 rows (Just for testing purposes), it stops the process and automatically selects the 5 candidates mode, requesting a payment method to continue (First time usage, only 4 rows and 2 columns to see what Hugging Face AutoTrain is and how it works).

When I try to use AutoTrain Advanced for LLM Fine tuning, I always get errors. I tried different LLM and different datasets.

The process starts and then stops, sometimes without returning the cause of the error, and sometimes the error is ““requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api.autotrain.huggingface.co/projects/X/start_training””.

I must mention that since I only want to test and get familiar to H“uggingFace, my estimated costs are always: “Estimated cost: $0.00. Note: clicking on ‘Create Project’ will start training and incur charges!”

So my question is, do I need to add a payment method to test Auto Train and otherwise I’m wasting my time thinking the trial is for free, or do you think i’m doing something technically wrong?

Thank you.

Attach image of the screen I keep getting independently of the task size:

2 Likes

Hi, I am getting the same error and assuming the same explanation : I have no payment method yet configured although I am intending to do it
Any clue out there ?

1 Like

Hi, I find myself in the same situation where I’ve been struggling with the same issue for the past three days…

Here is a screenshot of my space with the proper settings to train the model and the loaded dataset:

And here are the logs I receive once I create the project:

INFO task_type: LLM Finetuning
INFO model_choice: HuggingFace Hub
INFO Updating hub model choices for task: lm_training, model_choice: HuggingFace Hub
Traceback (most recent call last):
File “/app/env/lib/python3.9/site-packages/gradio/routes.py”, line 442, in run_predict
output = await app.get_blocks().process_api(
File “/app/env/lib/python3.9/site-packages/gradio/blocks.py”, line 1392, in process_api
result = await self.call_function(
File “/app/env/lib/python3.9/site-packages/gradio/blocks.py”, line 1097, in call_function
prediction = await anyio.to_thread.run_sync(
File “/app/env/lib/python3.9/site-packages/anyio/to_thread.py”, line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File “/app/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py”, line 877, in run_sync_in_worker_thread
return await future
File “/app/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py”, line 807, in run
result = context.run(func, *args)
File “/app/env/lib/python3.9/site-packages/gradio/utils.py”, line 703, in wrapper
response = f(*args, **kwargs)
File “/app/src/autotrain/app.py”, line 170, in _update_col_map
data_cols = pd.read_csv(training_data[0].name, nrows=2).columns.tolist()
TypeError: ‘NoneType’ object is not subscriptable
INFO Estimating costs…
INFO model_choice: HuggingFace Hub
INFO Updating hub model choices for task: lm_training, model_choice: HuggingFace Hub
INFO Estimating costs…
INFO Estimating costs…
INFO Estimating costs…
INFO Estimating number of samples
INFO Estimating costs for: num_models: 3, task: lm_training, num_samples: 3996
INFO Getting project cost…
INFO Sending GET request to https://api.autotrain.huggingface.co/pricing/compute?username=JordanLaforet&task_id=9&num_samples=3996&num_models=3
INFO Estimated_cost: 0
INFO :rotating_light::rotating_light::rotating_light:Creating project: h1fq-m1lc-mxul
INFO :rotating_light:Task: lm_training
INFO :rotating_light:Training data: [<tempfile._TemporaryFileWrapper object at 0x7f6e8e13c850>]
INFO :rotating_light:Validation data: None
INFO :rotating_light:Training params: [{“hub_model”: “meta-llama/Llama-2-7b-hf”, “num_models”: 3}]
INFO :rotating_light:Hub model: meta-llama/Llama-2-7b-hf
INFO :rotating_light:Estimated cost: 0.0
INFO :rotating_light::Can pay: False
INFO Dataset: h1fq-m1lc-mxul (lm_training)
Train data: [‘/tmp/gradio/552a0fe9ac975ff47df0e51f77a96d2ef40f8e6a/Dataset.csv’]
Valid data:
Column mapping: {‘text’: ‘training_data’}

Pushing dataset shards to the dataset hub: 0%| | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format: 0%| | 0/4 [00:00<?, ?ba/s]
Creating parquet from Arrow format: 100%|██████████| 4/4 [00:00<00:00, 656.93ba/s]

Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:00<00:00, 2.97it/s]
Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:00<00:00, 2.97it/s]

Pushing dataset shards to the dataset hub: 0%| | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format: 0%| | 0/1 [00:00<?, ?ba/s]
Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 712.95ba/s]

Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:00<00:00, 5.17it/s]
Pushing dataset shards to the dataset hub: 100%|██████████| 1/1 [00:00<00:00, 5.17it/s]

Downloading metadata: 0%| | 0.00/466 [00:00<?, ?B/s]
Downloading metadata: 100%|██████████| 466/466 [00:00<00:00, 7.73MB/s]

INFO :rocket::rocket::rocket: Creating project h1fq-m1lc-mxul, task: lm_training
INFO :rocket: Using username: JordanLaforet
INFO :rocket: Using param_choice: autotrain
INFO :rocket: Using hub_model: meta-llama/Llama-2-7b-hf
INFO :rocket: Using job_params: [{‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘num_models’: 3, ‘task’: ‘lm_training’}]
INFO :rocket: Creating project h1fq-m1lc-mxul, task: lm_training
INFO :rocket: Creatin​:rocket:g project with payload: {‘username’: ‘JordanLaforet’, ‘proj_name’: ‘h1fq-m1lc-mxul’, ‘task’: 9, ‘config’: {‘advanced’: True, ‘autotrain’: True, ‘language’: ‘unk’, ‘max_models’: 3, ‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘params’: [{‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘task’: ‘lm_training’}]}}
INFO Creating project with payload: {‘username’: ‘JordanLaforet’, ‘proj_name’: ‘h1fq-m1lc-mxul’, ‘task’: 9, ‘config’: {‘advanced’: True, ‘autotrain’: True, ‘language’: ‘unk’, ‘max_models’: 3, ‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘params’: [{‘hub_model’: ‘meta-llama/Llama-2-7b-hf’, ‘task’: ‘lm_training’}]}}
INFO Sending POST request to https://api.autotrain.huggingface.co/projects/create
INFO Sending POST request to https://api.autotrain.huggingface.co/projects/81027/data/start_processing
INFO :hourglass_flowing_sand: Waiting for data processing to complete …
INFO Sending GET request to https://api.autotrain.huggingface.co/projects/81027
INFO Sending GET request to https://api.autotrain.huggingface.co/projects/81027
INFO :white_check_mark: Data processing complete!
INFO :rocket: Approving project # 81027
INFO Sending POST request to https://api.autotrain.huggingface.co/projects/81027/start_training
Traceback (most recent call last):
File “/app/env/lib/python3.9/site-packages/gradio/routes.py”, line 442, in run_predict
output = await app.get_blocks().process_api(
File “/app/env/lib/python3.9/site-packages/gradio/blocks.py”, line 1392, in process_api
result = await self.call_function(
File “/app/env/lib/python3.9/site-packages/gradio/blocks.py”, line 1097, in call_function
prediction = await anyio.to_thread.run_sync(
File “/app/env/lib/python3.9/site-packages/anyio/to_thread.py”, line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File “/app/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py”, line 877, in run_sync_in_worker_thread
return await future
File “/app/env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py”, line 807, in run
result = context.run(func, *args)
File “/app/env/lib/python3.9/site-packages/gradio/utils.py”, line 703, in wrapper
response = f(*args, **kwargs)
File “/app/src/autotrain/app.py”, line 503, in _create_project
project.approve(project_id)
File “/app/src/autotrain/project.py”, line 201, in approve
_ = http_post(
File “/app/src/autotrain/utils.py”, line 94, in http_post
response.raise_for_status()
File “/app/env/lib/python3.9/site-packages/requests/models.py”, line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api.autotrain.huggingface.co/projects/81027/start_training

It appears that the ‘/start_processing’ step runs fine, but the ‘/start_training’ step encounters an issue when making a request to the API. As I’m relatively new to Hugging Face, I’m unsure about the nature of the problem. Any assistance you could provide would be greatly appreciated!

Has anyone ever fixed this? Have this very same issue trying to finetune llama-2-7b-chat-hf using autotrain advanced locally (as of august 10, 2023) and I could not find anything else online.

1 Like

Yes, take a look at my answer in this topic :

Cheers,

I’m not seeing any answers that will help me here. I’m having the exact same problem as OP, I cannot select to use the free tier, it auto-selects the highest paid tier and demands payment

Short answer : Add a payment method.

You are welcome. :slight_smile:

1 Like

Short answer: no. If there’s a free method it should be free. If not, don’t advertise a free method.

1 Like

So, there is no free way to train models in Hugging Face using free CPU option?

damn you got left hanging