Validation VS Test with Transformers Trainer

Seth94 · May 14, 2022, 7:09pm

Hi, newbie here, I’m fine-tuning Roberta-base with the code I’ve attached below and have some questions:

Have I understood it correctly that the training process used here will contaminate dataset used for evaluation? Or could the validation data here be considered test-data, and I could simply do an 80/20 split? I’ve read I need to use a validation set when doing hyperparameter tuning, is such tuning done behind the scene (when calculating loss?)
When I run trainer.evaluate, will it automatically use the evaluation dataset? For final testing, should I specify the last part of the dataset, in this case, split='train[90%:]

A lot of tutorials called the evaluation dataset “test-data”, which made me a bit confused. Few tutorials also go through the process of first validating, then testing.

train_data = datasets.load_dataset('csv', data_files = 'datasets/all_shuffled.csv', split='train[:80%]')
vali_data = datasets.load_dataset('csv', data_files = 'datasets/all_shuffled.csv', split='train[80%:90%]')

training_args = TrainingArguments(
    output_dir = 'roberta',
    num_train_epochs=4,
    per_device_train_batch_size = 4,
    gradient_accumulation_steps = 16,    
    per_device_eval_batch_size= 8,
    evaluation_strategy = 'no',
    save_strategy = 'no',
    disable_tqdm = False, 
    load_best_model_at_end=True,
    warmup_steps=500,
    weight_decay=0.01,
    logging_steps = 8,
    fp16 = False,
    logging_dir='roberta/logs',
    dataloader_num_workers = 8,
    run_name = 'roberta-classification'
)

trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_data,
    eval_dataset=vali_data
)

trainer.train()
trainer.evaluate()

aomar85 · May 16, 2022, 7:36am

Hello,
Just like you, all your questions came across my mind when trying to fine-tune the first model. I was using the test dataset as a validation dataset, but this is totally wrong. You have to split the data into three datasets and make sure that the test dataset is unseen, to guarantee that there is not a data leak.

So, let’s make sure that the validation dataset is not the same as the test dataset.

The method evaluation takes the “tokenized test dataset” as a parameter to be used. I tried BERT, but I think it is the same as your model.

I hope I answered your question

Seth94 · June 6, 2022, 8:35pm

Thanks for your input, this makes sense!

Topic		Replies	Views
Technical clarification on the validation data vs. the training data in the trainer API 🤗Transformers	1	752	January 6, 2022
Is Eval and Validation same in Trainer API? Beginners	4	1732	September 14, 2021
Training a domain-specific roberta from roberta-base Beginners	7	6085	February 2, 2021
Trainer.evaluate() vs trainer.predict() 🤗Transformers	6	36256	July 10, 2024
Increasing validation loss even with small learning rate - RoBERTa Models	0	1124	March 1, 2021

Validation VS Test with Transformers Trainer

Related topics