Type of dataset in Trainer class

Hi, I was going through the documentation and got a confusion

trainer = Trainer(
model=model, # the instantiated :hugs: Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=test_dataset # evaluation dataset
)

I couldn’t understand what is the type of train_dataset and how the target for loss calculation is selected.
In Fine-tuning in native TensorFlow 2 also there is no target value. Am I missing something?
model.fit(train_dataset, epochs=2, steps_per_epoch=115)

Thank you

For more context, he/she is talking about this page: https://huggingface.co/transformers/training.html

I also got confused by this bit of the documentation, but I think this code expects datasets like the ones provided by Hugging Face’s NLP package.

I think they are all based on Pytorch’s Dataset Class, but I could be mistaken.

Try to use one of the datasets provided by their NLP package and check if it works correctly.

Hope this helps! :hugs:

1 Like

Hi @suyash21 this post has some explanation about the dataset expected by Trainer

1 Like

Trainer is to be used with PyTorch, so in this case the train_dataset needs to be a PyTorch dataset. TFTrainer would expect a TF dataset. The doc page is a bit unclear (types are right in the signature but too short/wrong in the enumeration). I’ll send a fix to this later today.