Trainer API weights initialization

Ciao :hugs: I am working (and loving!!!) with the Trainer API to train a custom transformer architecture. So far everything has been fairly smooth, but I was wondering how you guys handle weight initialization.

I can initialize weight as I please by the usual model.apply(model._init_weights), but I was wondering what is the function you use under the hood (if any). Thank you!

1 Like

I think many people initialize with an empty weight or use from_pretrained to bring in the initial values of the base model.

Thank you for taking the time to answer @John6666 – Not really sure it applies to my situation tho.

I am interested in weight initialization for training, and specifically on what are the defaults on the Trainer API when it comes down to initialize the weights (I mean, what distributions or is it just a fallback to Pytorch default init)

1 Like