Issue with Creating Project in AutoTrain LLM: Internal Server Error (HTTP 500)

Hello everyone,

I’ve recently encountered an issue while trying to create a project using the AutoTrain LLM platform and would appreciate any insights or assistance.

Issue Summary

While trying to set up a new language model training project using the AutoTrain LLM interface, I ran into an Internal Server Error (HTTP 500) after submitting the project creation form. The server’s response indicated a problem with the dataset column mapping or its processing.

Detailed Description

  • Hardware Used: A100 Large
  • Dataset Identifier: 1c3k-zrjq-wp4h (lm_training)
  • Data Format: CSV with columns ‘prompt’, ‘text’, and ‘rejected_text’
  • Error Message: The system threw a ValueError indicating a need for either text_column or prompt_column and rejected_text_column.

Here are the relevant parts of the error log for clarity:

> INFO    hardware: A100 Large
> INFO    Dataset: 1c3k-zrjq-wp4h (lm_training)
Train data: [<tempfile.SpooledTemporaryFile object at 0x7f61dbff07c0>]
Valid data: []
Column mapping: {'text': 'text', 'prompt': 'prompt', 'rejected_text': 'rejected_text'}

INFO:     10.16.34.18:51514 - "POST /create_project HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/app/env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/app/env/lib/python3.10/site-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/app/env/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/app/env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/app/env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/app/env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/app/env/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/app/env/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/app/env/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/app/env/lib/python3.10/site-packages/fastapi/routing.py", line 274, in app
    raw_response = await run_endpoint_function(
  File "/app/env/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/app/src/autotrain/app.py", line 258, in handle_form
    dset.prepare()
  File "/app/src/autotrain/dataset.py", line 302, in prepare
    preprocessor = LLMPreprocessor(
  File "<string>", line 13, in __init__
  File "/app/src/autotrain/preprocessor/text.py", line 140, in __post_init__
    raise ValueError("Please provide either text_column or prompt_column and rejected_text_column")
ValueError: Please provide either text_column or prompt_column and rejected_text_column

Parameters and HTTP Request

I used the following configuration for the training parameters:

{
  "lr": 0.0005,
  "epochs": 3,
  "batch_size": 1,
  "warmup_ratio": 0.1,
  "gradient_accumulation": 8,
  "optimizer": "adamw_torch",
  "scheduler": "linear",
  "weight_decay": 0.01,
  "max_grad_norm": 1,
  "seed": 42,
  "block_size": 2048,
  "use_peft": false,
  "lora_r": 16,
  "lora_alpha": 32,
  "lora_dropout": 0.05,
  "logging_steps": 100,
  "evaluation_strategy": "epoch",
  "save_total_limit": 1,
  "save_strategy": "epoch",
  "auto_find_batch_size": false,
  "fp16": true,
  "use_int8": true,
  "model_max_length": 1445,
  "use_int4": false,
  "target_modules": "",
  "merge_adapter": false,
  "use_flash_attention_2": false,
  "disable_gradient_checkpointing": false,
  "model_ref": null,
  "dpo_beta": 0.1,
  "text_column": "text"
}

Also tried adding:

  "prompt_column": "prompt",
  "rejected_text_column": "rejected_text"

But no luck.

HTTP POST request payload /create_project endpoint of autotrain space

autotrain_user: Hjallti
project_name: 1c3k-zrjq-wp4h
task: llm:dpo
base_model: berkeley-nest/Starling-LM-7B-alpha
hardware: A100 Large
data_files_training: (binary)
data_files_valid: (binary)
column_mapping: Enter column mapping...
params: {
  "lr": 0.00003,
  "epochs": 3,
  "text_column": "text",
  "batch_size": 2,
  "warmup_ratio": 0.1,
  "gradient_accumulation": 1,
  "optimizer": "adamw_torch",
  "scheduler": "linear",
  "weight_decay": 0,
  "max_grad_norm": 1,
  "seed": 42,
  "block_size": 1024,
  "use_peft": true,
  "lora_r": 16,
  "lora_alpha": 32,
  "lora_dropout": 0.05,
  "logging_steps": -1,
  "evaluation_strategy": "epoch",
  "save_total_limit": 1,
  "save_strategy": "epoch",
  "auto_find_batch_size": false,
  "fp16": true,
  "use_int8": false,
  "model_max_length": 2048,
  "use_int4": true,
  "target_modules": "",
  "merge_adapter": false,
  "use_flash_attention_2": false,
  "disable_gradient_checkpointing": false,
  "model_ref": null,
  "dpo_beta": 0.1
}

Dataset Structure

The dataset train.csv used for the project follows the format:

prompt, text, rejected_text
"hello", "hi nice to meet you", "leave me alone"
...

I followed the data format guidelines for DPO trainer as per the AutoTrain LLM documentation.

FYI, same dataset/model works fine without the issues on huggingface when running locally but I don’t have necessary processing power and want to use the autotrain on huggingface.

autotrain llm \
  --train \
  --model "berkeley-nest/Starling-LM-7B-alpha" \
  --project-name "my_autotrain_llm" \
  --data-path "data/" \
  --text-column "text" \
  --lr 5e-4 \
  --batch-size 1 \
  --epochs 3 \
  --block-size 2048 \
  --warmup-ratio 0.1 \
  --lora-r 16 \
  --lora-alpha 32 \
  --lora-dropout 0.05 \
  --weight-decay 0.01 \
  --gradient-accumulation 8 \
  --dpo-beta 0.1 \
  --fp16 \
  --model_max_length 1445 \
  --logging_steps 100 \
  --push-to-hub \
  --token "REDACTED" \
  --repo-id "Hjallti/my_autotrain_llm"

Request for Assistance

Has anyone faced a similar issue or can identify a potential cause based on the provided information? Any guidance on how to resolve this error or adjust the dataset and parameters would be greatly appreciated.

Thank you in advance for your help!

Hello @abhishek , can you please give this a look if you have the chance? I think it should be affecting lots of users not just me.

Are you selecting LLM DPO from the dropdown?

Yes. I am selecting LLM DPO

Can you share your dataset? If private, you can also share (if possible) with autotrain@hf.co

Thanks, I’ve shared the dataset to that address with the same email title as this thread.

What was the issue with the data?