Hello, I am trying to fine tune a GPT2 model with my data. While i am use train() method getting below error.
KeyError: âInvalid key. Only three types of key are available: (1) string, (2) integers for backend Encoding, and (3) slices for data subsetting.â
below is my code
cleaned_text =ââ
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained(âgpt2â)
Add a new padding token to the tokenizer
tokenizer.add_special_tokens({âpad_tokenâ: â[PAD]â})
Tokenize the text using GPT-2 tokenizer
tokenized_text = tokenizer(cleaned_text, truncation=True, padding=True)
training_args = TrainingArguments(
output_dir=â./gpt2-finetunedâ,
overwrite_output_dir=True,
num_train_epochs=3,
per_device_train_batch_size=2,
save_steps=10_000,
save_total_limit=2,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_text,
tokenizer=tokenizer,
)
trainer.train()