Getting random results with BERT

Hi
I have modified a BERT model a bit and adds small “Linear” layers between its layer, the only random part is random initalization done for these layers as below:

W = torch.nn.init.xavier_normal_(tensor, gain=math.sqrt(2))

I have put these initialization when defining each layers. I am getting each time 3-4% difference, and really appreciate your help to fix this issue.

  • Could you please help me on how I should handle initialization on top of a BERT model, should them all go inside __init_weights(), would that differ if one does it inside this function or anywhere in the model?
  • Huggingface run_glue.py fix the random seeds on top of all the lines, shall I redo it each time for the initialization?

I am really struggling with this issue and appreciate your help a lot.
@sgugger @stas

Hi
I confirm the same issue happens also for the BERT model without any modifications, for this I run it on MRPC for 3 epochs, here is the two results:

[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,542 >>   epoch                        =                3.0
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,542 >>   eval_average_metrics         = 0.8355071710663065
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,542 >>   eval_mem_cpu_alloc_delta     =                0MB
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,543 >>   eval_mem_cpu_peaked_delta    =                1MB
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,543 >>   eval_mem_gpu_alloc_delta     =                0MB
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,543 >>   eval_mem_gpu_peaked_delta    =              264MB
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,543 >>   mrpc_eval_accuracy           =             0.8088
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,543 >>   mrpc_eval_combined_score     =             0.8355
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,543 >>   mrpc_eval_f1                 =             0.8622
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,543 >>   mrpc_eval_loss               =             0.5017
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,543 >>   mrpc_eval_runtime            =         0:00:00.31
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:01,543 >>   mrpc_eval_samples_per_second =            656.083

and

[INFO|trainer_pt_utils.py:722] 2021-04-26 00:46:42,272 >> ***** test metrics *****
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   epoch                        =                3.0
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   eval_average_metrics         = 0.8656245715069244
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   eval_mem_cpu_alloc_delta     =                0MB
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   eval_mem_cpu_peaked_delta    =                2MB
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   eval_mem_gpu_alloc_delta     =                0MB
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   eval_mem_gpu_peaked_delta    =              264MB
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   mrpc_eval_accuracy           =             0.8431
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   mrpc_eval_combined_score     =             0.8656
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   mrpc_eval_f1                 =             0.8881
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   mrpc_eval_loss               =             0.4185
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   mrpc_eval_runtime            =         0:00:00.32
[INFO|trainer_pt_utils.py:727] 2021-04-26 00:46:42,272 >>   mrpc_eval_samples_per_second =            623.473

This now looks to me this is a library issue, @sgugger really appreciate your comments on this issue. thanks

Please do not at-mention moderators of the forum in every single one of your message.

Hi
Sure, the issue resolved with upgrading to 4.6.0 dev version of transformers. thank you