Transformers and Hyperparameter search using Optuna

The-Fanta · May 4, 2023, 1:04pm

Steps with transformers are not epochs. In my understanding, 1 step = 1 batch
So if you have a train set with, say, 1000 samples, and you train in batches of 64 samples, then it will take you 16 steps (16 batches) to push the whole train set through training once, i.e. to do 1 epoch. In this example, 1 epoch = 16 steps. If you do the math for your script, you should find that, in your case, 500 steps = 1 epoch.

Not sure I understand which .json file you are referring to. What is the complete path?
Log fiiles I have are organized like this

There is one directory for every run (what Optuna calls “trials”), and every run contains its checkpoints, in my case they are saved every 32 steps because here I have 1 epoch = 32 steps . In every checkpoint, trainer_state.json contains all the info for that specific run up to the given checkpoint included.

If you are looking for a single, overall log file with all the runs together, no I don’t have it. I get one log file trainer_state.json for every checkpoint of every run.

Topic		Replies	Views
Hyperparameter_search does not log params after first trial 🤗Transformers	0	332	March 4, 2021
Get Optuna study from hyperparameter-search in Trainer? Beginners	1	851	July 14, 2021
How to stop Optuna saving checkpoints during Hyperparameter Search Beginners	1	1321	June 14, 2021
Trainer.predict() does not return values in optuna search 🤗Transformers	0	583	January 23, 2022
Parallel HPO when using `trainer.hyperparameter_search()` 🤗Transformers	0	354	December 30, 2021

Transformers and Hyperparameter search using Optuna

Related topics