I’m getting the following error with AlbertForMaskedLM:
view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Its triggered during the backward pass and i thought it was due to the .view
used in the loss calculation.
A fix is mentioned in this issue: https://github.com/huggingface/transformers/issues/4406
to add .contiguous()
before view, or else replace view with .reshape
I’ve tried both (.view is used in the loss calculation of AlbertForMaskedLM
and also in AlbertAttention
), but I’m still getting the error
It doesn’t occur immediately, but a little into the warmup phase (356 batches in, batch size = 8)
Any help would be really appreciated, not sure what else I can do, maybe there is another .view hiding somewhere…
Using:
- pytorch 1.6
- transformers 3.1, version on master
full trace:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-36-3435b262f1ae> in <module>
----> 1 trainer.train()
~/ml/projects/consulting/pretrain_bert/transformers/src/transformers/trainer.py in train(self, model_path, trial)
741 continue
742
--> 743 tr_loss += self.training_step(model, inputs)
744 self.total_flos += self.floating_point_ops(inputs)
745
~/ml/projects/consulting/pretrain_bert/transformers/src/transformers/trainer.py in training_step(self, model, inputs)
1058
1059 if self.args.fp16 and _use_native_amp:
-> 1060 self.scaler.scale(loss).backward()
1061 elif self.args.fp16 and _use_apex:
1062 with amp.scale_loss(loss, self.optimizer) as scaled_loss:
~/anaconda3/envs/fastai2_me/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
183 products. Defaults to ``False``.
184 """
--> 185 torch.autograd.backward(self, gradient, retain_graph, create_graph)
186
187 def register_hook(self, hook):
~/anaconda3/envs/fastai2_me/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
125 Variable._execution_engine.run_backward(
126 tensors, grad_tensors, retain_graph, create_graph,
--> 127 allow_unreachable=True) # allow_unreachable flag
128
129
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.