train_encoded_inputs = tokenizer(x_train[“TextData”].tolist(), padding=True, truncation=True, max_length=512, return_tensors=‘pt’)
test_encoded_inputs = tokenizer(x_test[“TextData”].tolist(), padding=True, truncation=True, max_length=512,return_tensors=‘pt’)
train_encoded_inputs
training_args = TrainingArguments(
output_dir=‘./results’, # sortie du modèle
learning_rate=2e-5,
num_train_epochs=3, # nombre d’époques
per_device_train_batch_size=4, # taille du lot d’entraînement
per_device_eval_batch_size=4, # taille du lot d’évaluation
warmup_steps=500, # nombre de warmup steps
weight_decay=0.01, # taux de décroissance du poids
logging_steps=10,
fp16=True
)
trainer = Trainer(
model=model, # le modèle à entraîner
args=training_args, # arguments de l’entraînement
train_dataset=train_encoded_inputs, # données d’entraînement
eval_dataset=test_encoded_inputs, # données de test
tokenizer=tokenizer,
) i recieve KeyError Traceback (most recent call last)
in <cell line: 1>()
----> 1 trainer.train()
7 frames
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py in getitem(self, item)
258 return {key: self.data[key][item] for key in self.data.keys()}
259 else:
→ 260 raise KeyError(
261 "Invalid key. Only three types of key are available: "
262 “(1) string, (2) integers for backend Encoding, and (3) slices for data subsetting.”
KeyError: ‘Invalid key. Only three types of key are available: (1) string, (2) integers for backend Encoding, and (3) slices for data subsetting.’
It looks like you’re providing inputs prepared by the tokenizer to the Trainer. This is not allowed, as it returns a dictionary (actually a BatchEncoding), you can only provide either a Hugging Face dataset or a PyTorch dataset to it.