Getting error - trainer.train()

DeepanSrinivasan · March 11, 2024, 2:59am

Hello, I am trying to fine tune a GPT2 model with my data. While i am use train() method getting below error.

KeyError: ‘Invalid key. Only three types of key are available: (1) string, (2) integers for backend Encoding, and (3) slices for data subsetting.’

below is my code

cleaned_text =“”

from transformers import GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained(‘gpt2’)

Add a new padding token to the tokenizer

tokenizer.add_special_tokens({‘pad_token’: ‘[PAD]’})

Tokenize the text using GPT-2 tokenizer

tokenized_text = tokenizer(cleaned_text, truncation=True, padding=True)

training_args = TrainingArguments(
output_dir=“./gpt2-finetuned”,
overwrite_output_dir=True,
num_train_epochs=3,
per_device_train_batch_size=2,
save_steps=10_000,
save_total_limit=2,
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_text,

tokenizer=tokenizer,

)

trainer.train()

wonderfullymade · April 23, 2024, 1:02pm

hello, how were you able to fix this issue?

op97 · April 24, 2024, 9:06pm

any updates on this please ?
i’m facing the same problem when fine-tuning mistral 7b

Chahnwoo · April 25, 2024, 12:05am

Could you provide the full error log?

ghitak · June 3, 2024, 9:53am

hello, how you solve it ?

Topic		Replies	Views
Issue in using trainer class for Finetuning GPT-2 Models	1	616	November 23, 2020
Key Error 'loss' while fine tuning GPT-2 with the Trainer utility 🤗Transformers	9	7481	May 10, 2022
Getting KeyError: 203 when running trainer.train() 🤗Transformers	0	443	July 16, 2023
Invalid Key Error when Training GPT2 from Scratch using trainer.train() 🤗Transformers	3	1543	April 15, 2024
Error in fine-tuning BERT Beginners	8	6288	February 21, 2022

Getting error - trainer.train()

Add a new padding token to the tokenizer

Tokenize the text using GPT-2 tokenizer

Related topics