Trainer API weights initialization

fracapuano · February 9, 2025, 4:55pm

Ciao I am working (and loving!!!) with the Trainer API to train a custom transformer architecture. So far everything has been fairly smooth, but I was wondering how you guys handle weight initialization.

I can initialize weight as I please by the usual model.apply(model._init_weights), but I was wondering what is the function you use under the hood (if any). Thank you!

John6666 · February 10, 2025, 12:08pm

I think many people initialize with an empty weight or use from_pretrained to bring in the initial values of the base model.

fracapuano · February 10, 2025, 1:05pm

Thank you for taking the time to answer @John6666 – Not really sure it applies to my situation tho.

I am interested in weight initialization for training, and specifically on what are the defaults on the Trainer API when it comes down to initialize the weights (I mean, what distributions or is it just a fallback to Pytorch default init)

Topic		Replies	Views
Lazy model initialization 🤗Transformers	3	948	May 8, 2024
Do we use pre-trained weights in Trainer? Beginners	2	430	January 7, 2022
How to initialize a model with random weights Beginners	3	849	October 28, 2024
Initializing the weights of the final layer of e.g. BertForTokenClassification with a manual seed 🤗Transformers	2	7941	October 6, 2020
Initializing a big model on GPU with random weights 🤗Transformers	2	69	January 14, 2025

Trainer API weights initialization

Related topics