Hello everyone,
I’ve recently encountered an issue while trying to create a project using the AutoTrain LLM platform and would appreciate any insights or assistance.
Issue Summary
While trying to set up a new language model training project using the AutoTrain LLM interface, I ran into an Internal Server Error (HTTP 500)
after submitting the project creation form. The server’s response indicated a problem with the dataset column mapping or its processing.
Detailed Description
- Hardware Used: A100 Large
- Dataset Identifier: 1c3k-zrjq-wp4h (lm_training)
- Data Format: CSV with columns ‘prompt’, ‘text’, and ‘rejected_text’
- Error Message: The system threw a
ValueError
indicating a need for eithertext_column
orprompt_column
andrejected_text_column
.
Here are the relevant parts of the error log for clarity:
> INFO hardware: A100 Large
> INFO Dataset: 1c3k-zrjq-wp4h (lm_training)
Train data: [<tempfile.SpooledTemporaryFile object at 0x7f61dbff07c0>]
Valid data: []
Column mapping: {'text': 'text', 'prompt': 'prompt', 'rejected_text': 'rejected_text'}
INFO: 10.16.34.18:51514 - "POST /create_project HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/app/env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/app/env/lib/python3.10/site-packages/fastapi/applications.py", line 1106, in __call__
await super().__call__(scope, receive, send)
File "/app/env/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
raise e
File "/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
await self.app(scope, receive, send)
File "/app/env/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/app/env/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/app/env/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/app/env/lib/python3.10/site-packages/fastapi/routing.py", line 274, in app
raw_response = await run_endpoint_function(
File "/app/env/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/app/src/autotrain/app.py", line 258, in handle_form
dset.prepare()
File "/app/src/autotrain/dataset.py", line 302, in prepare
preprocessor = LLMPreprocessor(
File "<string>", line 13, in __init__
File "/app/src/autotrain/preprocessor/text.py", line 140, in __post_init__
raise ValueError("Please provide either text_column or prompt_column and rejected_text_column")
ValueError: Please provide either text_column or prompt_column and rejected_text_column
Parameters and HTTP Request
I used the following configuration for the training parameters:
{
"lr": 0.0005,
"epochs": 3,
"batch_size": 1,
"warmup_ratio": 0.1,
"gradient_accumulation": 8,
"optimizer": "adamw_torch",
"scheduler": "linear",
"weight_decay": 0.01,
"max_grad_norm": 1,
"seed": 42,
"block_size": 2048,
"use_peft": false,
"lora_r": 16,
"lora_alpha": 32,
"lora_dropout": 0.05,
"logging_steps": 100,
"evaluation_strategy": "epoch",
"save_total_limit": 1,
"save_strategy": "epoch",
"auto_find_batch_size": false,
"fp16": true,
"use_int8": true,
"model_max_length": 1445,
"use_int4": false,
"target_modules": "",
"merge_adapter": false,
"use_flash_attention_2": false,
"disable_gradient_checkpointing": false,
"model_ref": null,
"dpo_beta": 0.1,
"text_column": "text"
}
Also tried adding:
"prompt_column": "prompt",
"rejected_text_column": "rejected_text"
But no luck.
HTTP POST request payload /create_project endpoint of autotrain space
autotrain_user: Hjallti
project_name: 1c3k-zrjq-wp4h
task: llm:dpo
base_model: berkeley-nest/Starling-LM-7B-alpha
hardware: A100 Large
data_files_training: (binary)
data_files_valid: (binary)
column_mapping: Enter column mapping...
params: {
"lr": 0.00003,
"epochs": 3,
"text_column": "text",
"batch_size": 2,
"warmup_ratio": 0.1,
"gradient_accumulation": 1,
"optimizer": "adamw_torch",
"scheduler": "linear",
"weight_decay": 0,
"max_grad_norm": 1,
"seed": 42,
"block_size": 1024,
"use_peft": true,
"lora_r": 16,
"lora_alpha": 32,
"lora_dropout": 0.05,
"logging_steps": -1,
"evaluation_strategy": "epoch",
"save_total_limit": 1,
"save_strategy": "epoch",
"auto_find_batch_size": false,
"fp16": true,
"use_int8": false,
"model_max_length": 2048,
"use_int4": true,
"target_modules": "",
"merge_adapter": false,
"use_flash_attention_2": false,
"disable_gradient_checkpointing": false,
"model_ref": null,
"dpo_beta": 0.1
}
Dataset Structure
The dataset train.csv
used for the project follows the format:
prompt, text, rejected_text
"hello", "hi nice to meet you", "leave me alone"
...
I followed the data format guidelines for DPO trainer as per the AutoTrain LLM documentation.
FYI, same dataset/model works fine without the issues on huggingface when running locally but I don’t have necessary processing power and want to use the autotrain on huggingface.
autotrain llm \
--train \
--model "berkeley-nest/Starling-LM-7B-alpha" \
--project-name "my_autotrain_llm" \
--data-path "data/" \
--text-column "text" \
--lr 5e-4 \
--batch-size 1 \
--epochs 3 \
--block-size 2048 \
--warmup-ratio 0.1 \
--lora-r 16 \
--lora-alpha 32 \
--lora-dropout 0.05 \
--weight-decay 0.01 \
--gradient-accumulation 8 \
--dpo-beta 0.1 \
--fp16 \
--model_max_length 1445 \
--logging_steps 100 \
--push-to-hub \
--token "REDACTED" \
--repo-id "Hjallti/my_autotrain_llm"
Request for Assistance
Has anyone faced a similar issue or can identify a potential cause based on the provided information? Any guidance on how to resolve this error or adjust the dataset and parameters would be greatly appreciated.
Thank you in advance for your help!