Hi, I have a question.
I tried to load weights from a checkpoint like below.
config = AutoConfig.from_pretrained("./saved/checkpoint-480000")
model = RobertaForMaskedLM(config=config)
Is this the right way?
It seems training speed is slower than before and training process crashed after some steps…
anaconda3/envs/pytorch/lib/python3.7/site-packages/transformers/trainer.py:263: FutureWarning: Passing `prediction_loss_only` as a keyword argument is deprecated and won't be possible in a future version. Use `args.prediction_loss_only` instead. Setting `args.prediction_loss_only=True
FutureWarning,
0%| | 0/2755530 [00:00<?, ?it/s] anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
0%| | 10000/2755530 [10:53:37<2855:04:31, 3.74s/it] anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
1%| | 20000/2755530 [21:44:42<2934:49:34, 3.86s/it] anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
1%| | 30000/2755530 [32:35:52<2922:14:07, 3.86s/it] anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
1%| | 32292/2755530 [35:05:09<3263:20:29, 4.31s/it]
I could not find what wend wrong but the process was gone…
BTW, I started training with transformers version 3.1.0.
Then stop it.
I upgraded the transformers into 3.4.0 and restart training because I could not even start training from checkpoint.
Could you give me hints for debugging?
Thanks in advance.