Error in AutoTrain Text Classification

I’m really struggling with the new AutoTrain tool. The old one worked really well, but this new one is really frustrating.

I get this error when running the text classifier:

INFO: 10.16.41.118:17966 - “POST /create_project HTTP/1.1” 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File “/app/env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py”, line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File “/app/env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py”, line 78, in call
return await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/fastapi/applications.py”, line 1106, in call
await super().call(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/applications.py”, line 122, in call
await self.middleware_stack(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 184, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 162, in call
await self.app(scope, receive, _send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 79, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 68, in call
await self.app(scope, receive, sender)
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 20, in call
raise e
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 17, in call
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 718, in call
await route.handle(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 276, in handle
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 66, in app
response = await func(request)
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 274, in app
raw_response = await run_endpoint_function(
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 191, in run_endpoint_function
return await dependant.call(**values)
File “/app/src/autotrain/app.py”, line 414, in handle_form
data_path = dset.prepare()
File “/app/src/autotrain/dataset.py”, line 257, in prepare
label_column = self.column_mapping[“label”]
KeyError: ‘label’

INFO hardware: A10G Large
INFO Task: text_multi_class_classification
INFO Column mapping: {‘text’: ‘text’, ‘target’: ‘target’}
INFO Dataset: autotrain-8giwo-evjbj (text_multi_class_classification)
Train data: [<tempfile.SpooledTemporaryFile object at 0x7efb5dc27fd0>]
Valid data:
Column mapping: {‘text’: ‘text’, ‘target’: ‘target’}

INFO: 10.16.41.118:22926 - “POST /create_project HTTP/1.1” 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File “/app/env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py”, line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File “/app/env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py”, line 78, in call
return await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/fastapi/applications.py”, line 1106, in call
await super().call(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/applications.py”, line 122, in call
await self.middleware_stack(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 184, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 162, in call
await self.app(scope, receive, _send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 79, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 68, in call
await self.app(scope, receive, sender)
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 20, in call
raise e
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 17, in call
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 718, in call
await route.handle(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 276, in handle
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 66, in app
response = await func(request)
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 274, in app
raw_response = await run_endpoint_function(
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 191, in run_endpoint_function
return await dependant.call(**values)
File “/app/src/autotrain/app.py”, line 414, in handle_form
data_path = dset.prepare()
File “/app/src/autotrain/dataset.py”, line 257, in prepare
label_column = self.column_mapping[“label”]
KeyError: ‘label’

Per the Text Classification instructions it should only be looking for a text or target column so I believe it looking for ‘label’ is a bug, or the documentation is incorrect.

I really loved AutoTrain prior to this new setup and was constantly recommending it to my students. I no longer can in its current form.

If I rename my target column and key to ‘label’ I can get past the above error, but then a new error appears:

INFO: 10.16.41.118:31056 - “POST /create_project HTTP/1.1” 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File “/app/env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py”, line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File “/app/env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py”, line 78, in call
return await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/fastapi/applications.py”, line 1106, in call
await super().call(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/applications.py”, line 122, in call
await self.middleware_stack(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 184, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 162, in call
await self.app(scope, receive, _send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 79, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 68, in call
await self.app(scope, receive, sender)
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 20, in call
raise e
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 17, in call
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 718, in call
await route.handle(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 276, in handle
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 66, in app
response = await func(request)
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 274, in app
raw_response = await run_endpoint_function(
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 191, in run_endpoint_function
return await dependant.call(**values)
File “/app/src/autotrain/app.py”, line 414, in handle_form
data_path = dset.prepare()
File “/app/src/autotrain/dataset.py”, line 258, in prepare
preprocessor = TextMultiClassClassificationPreprocessor(
File “”, line 14, in init
File “/app/src/autotrain/preprocessor/text.py”, line 35, in post_init
raise ValueError(f"{self.text_column} not in train data")
ValueError: text not in train data

INFO hardware: A10G Large
INFO Task: text_multi_class_classification
INFO Column mapping: {‘text’: ‘text’, ‘label’: ‘label’}
INFO Dataset: autotrain-8giwo-evjbj (text_multi_class_classification)
Train data: [<tempfile.SpooledTemporaryFile object at 0x7efb5dc26260>]
Valid data:
Column mapping: {‘text’: ‘text’, ‘label’: ‘label’}

INFO: 10.16.18.44:50179 - “POST /create_project HTTP/1.1” 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File “/app/env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py”, line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File “/app/env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py”, line 78, in call
return await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/fastapi/applications.py”, line 1106, in call
await super().call(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/applications.py”, line 122, in call
await self.middleware_stack(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 184, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 162, in call
await self.app(scope, receive, _send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 79, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 68, in call
await self.app(scope, receive, sender)
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 20, in call
raise e
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 17, in call
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 718, in call
await route.handle(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 276, in handle
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 66, in app
response = await func(request)
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 274, in app
raw_response = await run_endpoint_function(
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 191, in run_endpoint_function
return await dependant.call(**values)
File “/app/src/autotrain/app.py”, line 414, in handle_form
data_path = dset.prepare()
File “/app/src/autotrain/dataset.py”, line 258, in prepare
preprocessor = TextMultiClassClassificationPreprocessor(
File “”, line 14, in init
File “/app/src/autotrain/preprocessor/text.py”, line 35, in post_init
raise ValueError(f"{self.text_column} not in train data")
ValueError: text not in train data

please see docs for correct data format: AutoTrain

in the ui, you are also provided with small info butto (i) next to each input. please use it and read about fields such as column mapping.

OK, today the exact same setup I used yesterday doesn’t get the same errors today. I do however, get new errors.

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
NVIDIA Cloud Native Technologies - NVIDIA Docs .

INFO: Will watch for changes in these directories: [‘/app’]
WARNING: “workers” flag is ignored when reloading is enabled.
INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
INFO: Started reloader process [35] using StatReload

INFO Authenticating user…
WARNING Parameters not supplied by user and set to default: data_path, warmup_ratio, optimizer, evaluation_strategy, project_name, weight_decay, save_strategy, model, add_eos_token, lora_alpha, auto_find_batch_size, trainer, scheduler, repo_id, push_to_hub, text_column, valid_split, username, token, model_max_length, rejected_text_column, apply_chat_template, merge_adapter, train_split, save_total_limit, batch_size, logging_steps, max_grad_norm, disable_gradient_checkpointing, prompt_text_column, seed, lr, lora_dropout, lora_r, use_flash_attention_2, dpo_beta, gradient_accumulation, model_ref
WARNING Parameters not supplied by user and set to default: data_path, warmup_ratio, log, train_split, optimizer, save_total_limit, batch_size, logging_steps, evaluation_strategy, weight_decay, project_name, save_strategy, model, max_grad_norm, seed, lr, auto_find_batch_size, scheduler, max_seq_length, repo_id, target_column, epochs, push_to_hub, text_column, valid_split, gradient_accumulation, username, token
WARNING Parameters not supplied by user and set to default: data_path, warmup_ratio, log, train_split, optimizer, save_total_limit, batch_size, logging_steps, evaluation_strategy, weight_decay, project_name, save_strategy, model, max_grad_norm, image_column, seed, auto_find_batch_size, scheduler, lr, repo_id, target_column, epochs, push_to_hub, valid_split, gradient_accumulation, username, token
WARNING Parameters not supplied by user and set to default: data_path, warmup_ratio, optimizer, evaluation_strategy, project_name, weight_decay, save_strategy, model, lora_alpha, auto_find_batch_size, scheduler, repo_id, epochs, push_to_hub, text_column, valid_split, username, token, target_modules, train_split, max_target_length, batch_size, logging_steps, save_total_limit, max_grad_norm, quantization, seed, lr, lora_dropout, lora_r, max_seq_length, target_column, gradient_accumulation, peft
WARNING Parameters not supplied by user and set to default: time_limit, num_trials, data_path, train_split, numerical_columns, project_name, seed, model, repo_id, target_columns, push_to_hub, categorical_columns, id_column, task, valid_split, username, token
WARNING Parameters not supplied by user and set to default: resume_from_checkpoint, num_class_images, prior_loss_weight, checkpointing_steps, sample_batch_size, bf16, project_name, model, tokenizer, num_validation_images, scheduler, validation_images, rank, warmup_steps, repo_id, checkpoints_total_limit, epochs, image_path, lr_power, pre_compute_text_embeddings, adam_beta2, push_to_hub, validation_epochs, num_cycles, allow_tf32, username, token, scale_lr, validation_prompt, text_encoder_use_attention_mask, class_labels_conditioning, max_grad_norm, xl, seed, adam_beta1, logging, class_prompt, class_image_path, revision, adam_epsilon, prior_preservation, adam_weight_decay, local_rank, prior_generation_precision, dataloader_num_workers, tokenizer_max_length, center_crop
INFO: Started server process [37]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 10.16.41.118:47826 - “GET /?logs=container HTTP/1.1” 200 OK
INFO: 10.16.20.172:29787 - “GET /logo.png HTTP/1.1” 304 Not Modified
INFO Task: llm:sft
INFO: 10.16.18.44:58565 - “GET /params/llm%3Asft HTTP/1.1” 200 OK
INFO: 10.16.41.118:47826 - “GET /model_choices/llm%3Asft HTTP/1.1” 200 OK
INFO Task: text-classification
INFO: 10.16.20.172:43136 - “GET /params/text-classification HTTP/1.1” 200 OK
INFO: 10.16.41.118:50739 - “GET /model_choices/text-classification HTTP/1.1” 200 OK
INFO: 10.16.41.118:18498 - “GET /help/column_mapping_info HTTP/1.1” 200 OK
INFO hardware: A10G Large
INFO Task: text_multi_class_classification
INFO Column mapping: {‘text’: ‘text’, ‘label’: ‘target’}
INFO Dataset: autotrain-3aiws-78jpt (text_multi_class_classification)
Train data: [<tempfile.SpooledTemporaryFile object at 0x7f1710034fa0>]
Valid data:
Column mapping: {‘text’: ‘text’, ‘label’: ‘target’}

INFO: 10.16.41.118:56664 - “POST /create_project HTTP/1.1” 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File “/app/env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py”, line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File “/app/env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py”, line 78, in call
return await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/fastapi/applications.py”, line 1106, in call
await super().call(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/applications.py”, line 122, in call
await self.middleware_stack(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 184, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 162, in call
await self.app(scope, receive, _send)
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 79, in call
raise exc
File “/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 68, in call
await self.app(scope, receive, sender)
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 20, in call
raise e
File “/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 17, in call
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 718, in call
await route.handle(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 276, in handle
await self.app(scope, receive, send)
File “/app/env/lib/python3.10/site-packages/starlette/routing.py”, line 66, in app
response = await func(request)
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 274, in app
raw_response = await run_endpoint_function(
File “/app/env/lib/python3.10/site-packages/fastapi/routing.py”, line 191, in run_endpoint_function
return await dependant.call(**values)
File “/app/src/autotrain/app.py”, line 414, in handle_form
data_path = dset.prepare()
File “/app/src/autotrain/dataset.py”, line 271, in prepare
return preprocessor.prepare()
File “/app/src/autotrain/preprocessor/text.py”, line 79, in prepare
train_df, valid_df = self.split()
File “/app/src/autotrain/preprocessor/text.py”, line 57, in split
train_df, valid_df = train_test_split(
File “/app/env/lib/python3.10/site-packages/sklearn/utils/_param_validation.py”, line 211, in wrapper
return func(*args, **kwargs)
File “/app/env/lib/python3.10/site-packages/sklearn/model_selection/_split.py”, line 2638, in train_test_split
train, test = next(cv.split(X=arrays[0], y=stratify))
File “/app/env/lib/python3.10/site-packages/sklearn/model_selection/_split.py”, line 2197, in split
y = check_array(y, input_name=“y”, ensure_2d=False, dtype=None)
File “/app/env/lib/python3.10/site-packages/sklearn/utils/validation.py”, line 959, in check_array
_assert_all_finite(
File “/app/env/lib/python3.10/site-packages/sklearn/utils/validation.py”, line 109, in _assert_all_finite
raise ValueError(“Input contains NaN”)
ValueError: Input contains NaN

This appears to be an issue with the data splitting process. I have validated my CSV is valid. There are about 1500 lines so I do not believe it to be an issue of too little data for splitting.

Additional details: I am attempting to run the Docker version of AutoTrain from within the HF Spaces. Perhaps the Docker image there is not up to date?

I dont believe it was a rude response. i asked you to click the little info button next to column mapping that explains what column mapping really is. but from the response it looks like you didnt. that explains what column mapping really is. you seem to have changed the autotrain column mapping which should be label and not target in the left hand side (key)

If your dataset has text and target, you dont need to touch column mapping field. this is also in docs:

As long as you follow the data format in the docs, you do NOT have to change column mapping.

here its me successfully running text classification on a dataset that has two columns: text and target:

here are first few lines from the dataset:

~/Downloads/Datasets ❯ head -5 imdb1k.csv                                                                                              base 12:06:59
text,target
"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.<br /><br />It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.<br /><br />I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",positive
"A wonderful little production. <br /><br />The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. <br /><br />The actors are extremely well chosen- Michael Sheen not only ""has got all the polari"" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life. <br /><br />The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.",positive
"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.<br /><br />This was the most I'd laughed at one of Woody's comedies in years (dare I say a decade?). While I've never been impressed with Scarlet Johanson, in this she managed to tone down her ""sexy"" image and jumped right into a average, but spirited young woman.<br /><br />This may not be the crown jewel of his career, but it was wittier than ""Devil Wears Prada"" and more interesting than ""Superman"" a great comedy to go see with friends.",positive
"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.<br /><br />This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.<br /><br />OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.<br /><br />3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them.",negative

Its a little annoying that you literally changed that pop up last night, because I did read it, but now its different. Please acknowledge you make changes based on my feedback rather than appear to blame me for not doing everything I can before posting here.

There are, however, new errors. So please respond to those when you can. Again, I really loved the previous AutoTrain product and hope to be able to use it again. But the bugs here make it unusable, and I know how to train models. The purpose of AutoTrain is to make it easy for anyone to train, and its absolutely not that right now.

Its a little annoying that you literally changed that pop up last night, because I did read it, but now its different. Please acknowledge you make changes based on my feedback rather than appear to blame me for not doing everything I can before posting here.

No, it hasnt been changed for a while. you can see the repo here: GitHub - huggingface/autotrain-advanced: 🤗 AutoTrain Advanced. only recent change was upgrading transformers version. The text in the pop ups were changed last month:

now i see also you edited your post. the error says you have missing data. you should not have missing data in your csv. missing data: some rows dont have either a text or a target.

You can remove rows with NaN/missing data using:

import pandas as pd
df = pd.read_csv("data.csv")
df = df.dropna()
df.to_csv("cleaned_data.csv", index=False)

then, use cleaned_data.csv

Again, I really loved the previous AutoTrain product and hope to be able to use it again. But the bugs here make it unusable, and I know how to train models. The purpose of AutoTrain is to make it easy for anyone to train, and its absolutely not that right now.

old autotrain would have just shown failed without anything about any errors with the dataset. atleast, now, you know what is causing the error and can fix the dataset.

hopefully it will work and you will like it again :slight_smile:

My point about old AutoTrain was that the interface seemed less Dev-focused. I’m not sure it even makes sense to have a mapping function visible in this way. And I apologize if the pop-up hasn’t changed—but its still really confusing. It seems to imply “target” is the key.

I have gotten additional errors that I would like to share with you to consider implementing fixes for in a no-code environment. No-code, to me, implies it will fix error for you without having to read stack traces.

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

I understand this error. It might make sense to ask users if they want to drop these classes instead of requiring editing the dataset.

ValueError: Number of classes in train and valid are not the same. Training has 23 and valid has 17

This seems to be a problem with the splitting function. I’m unsure how to fix this myself without adding additional data. Is there a way to edit the split % in the autotrain params? Again, this seems to be something a no-code solution would solve for me, or at least prompt me for solutions.

Either way, I can work with this! Thanks for your help :slight_smile: and sorry for taking my frustrations out on you.

Hi @abhishek. I have the same issue (“ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.”), but my data seems to be in order. I haven’t modified the column mapping field.

Also, in the hardware dropdown menu, I only have the option “local”, even though I have the badge “running on A10G”.

Any idea what the problem could be?

I have the same error, how to solve this problem? Thanks!

it means in your dataset’s target column, there are very few samples for some of the classes. try increasing the samples for those classes.