Ciao I am working (and loving!!!) with the Trainer API to train a custom transformer architecture. So far everything has been fairly smooth, but I was wondering how you guys handle weight initialization.
I can initialize weight as I please by the usual model.apply(model._init_weights), but I was wondering what is the function you use under the hood (if any). Thank you!
Thank you for taking the time to answer @John6666 – Not really sure it applies to my situation tho.
I am interested in weight initialization for training, and specifically on what are the defaults on the Trainer API when it comes down to initialize the weights (I mean, what distributions or is it just a fallback to Pytorch default init)