Hyperparameter Optimization of end-to-end pretraining + fine tuning

rspreafico · September 12, 2021, 4:29pm

Hi there,

HuggingFace has great support for hyperparameter optimization. This is tied to a Trainer, which means that a masked language model pretraining would be optimized separately from fine tuning (in my case, classification). This means that the best model with regards to the pretraining objective is fixed, and so only a subset of training parameters can be optimized during fine tuning (a greedy search). The implicit assumption here is that the best parameters for the pretraining task (e.g. number of hidden layers) should be the best also for the fine tuning task.

By contrast, I would like to optimize hyperparameters end-to-end (a loop would involve both pretraining and fine tuning), where the metrics selecting the best model architecture come from the performance on the classification task. This can be achieved with custom code, but I was wondering whether this use case is appealing enough for the average user to be considered as a potential new feature from the HuggingFace team?

Thanks for your consideration.

Topic	Replies	Views
How to Optimize Fine-tuning in Hugging Face Transformers? Beginners	333	March 5, 2024
Help with Training a Custom Model using Hugging Face Transformers Beginners	30	October 11, 2024
How to Efficiently Fine-Tune Models on Custom Datasets with Limited Resources? Beginners	119	July 10, 2024
Trainer class optimization for transformer models Models	419	January 8, 2022
Fine-tune a Hugginface model with only loss function(without labels)? Beginners	406	October 18, 2021

Hyperparameter Optimization of end-to-end pretraining + fine tuning

Related topics