"KeyError: 'text' text_column = self.column_mapping["text"]"

Alanturner2 · December 16, 2024, 8:27am

The 500 error likely stems from a problem in either the preprocessing, model setup, or interaction with your environment. Here’s a checklist and suggestions to debug and resolve the issue:

1. Dataset Format

Check tokenization:
- Ensure that the tokens and tags in the CSV files are properly formatted as lists.
- Double-check that the quotes (") around tokens and tags are not interfering with reading the file as lists in Python.
Recommended Fix: Instead of storing lists as strings in CSV, store them as lists in JSONL (JSON Lines) format. Example:

json

Copy code

{"tokens": ["ist", "lebt", "Herr", "Berlin", "030", "Siemens", ".", "E-Mail-Adresse", "Telefonnummer"], "tags": ["O", "O", "O", "LOCATION", "PHONE_NUMBER", "ORGANIZATION", "O", "O", "O", "PHONE_NUMBER"]}

Use JSONL for better compatibility with frameworks like Hugging Face.

2. Loading the Dataset

If you’re using Hugging Face’s datasets library, ensure the dataset is correctly loaded. For a CSV:

python

Copy code

from datasets import load_dataset

data_files = {"train": "train.csv", "validation": "validate.csv"}
dataset = load_dataset("csv", data_files=data_files)

If your tokens and tags are strings, you may need to parse them:

python

Copy code

def preprocess_data(example):
    example["tokens"] = eval(example["tokens"])  # Convert string to list
    example["tags"] = eval(example["tags"])
    return example

dataset = dataset.map(preprocess_data)

Topic		Replies	Views
Token Classification, KeyError: 'text' I've tried every combination of data, .cvs, .jsonl you can imagine 🤗AutoTrain	1	13	November 12, 2025
Getting "Error 500" while trying to use AutoTrain for Token Classification 🤗AutoTrain	3	740	November 2, 2024
AutoTrain Token Classification Error 🤗AutoTrain	0	257	March 12, 2024
Error in AutoTrain Text Classification 🤗AutoTrain	12	1537	April 22, 2024
AutoNLP Error for Entity Dataset 🤗AutoTrain	0	1005	February 10, 2022

"KeyError: 'text' text_column = self.column_mapping["text"]"

1. Dataset Format

2. Loading the Dataset

Related topics