AlbertForMaskedLM error- "view size is not compatible..."

I’m getting the following error with AlbertForMaskedLM:

view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Its triggered during the backward pass and i thought it was due to the .view used in the loss calculation.

A fix is mentioned in this issue: https://github.com/huggingface/transformers/issues/4406

to add .contiguous() before view, or else replace view with .reshape

I’ve tried both (.view is used in the loss calculation of AlbertForMaskedLM and also in AlbertAttention), but I’m still getting the error

It doesn’t occur immediately, but a little into the warmup phase (356 batches in, batch size = 8)

Any help would be really appreciated, not sure what else I can do, maybe there is another .view hiding somewhere…

Using:

  • pytorch 1.6
  • transformers 3.1, version on master

full trace:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-36-3435b262f1ae> in <module>
----> 1 trainer.train()

~/ml/projects/consulting/pretrain_bert/transformers/src/transformers/trainer.py in train(self, model_path, trial)
    741                     continue
    742 
--> 743                 tr_loss += self.training_step(model, inputs)
    744                 self.total_flos += self.floating_point_ops(inputs)
    745 

~/ml/projects/consulting/pretrain_bert/transformers/src/transformers/trainer.py in training_step(self, model, inputs)
   1058 
   1059         if self.args.fp16 and _use_native_amp:
-> 1060             self.scaler.scale(loss).backward()
   1061         elif self.args.fp16 and _use_apex:
   1062             with amp.scale_loss(loss, self.optimizer) as scaled_loss:

~/anaconda3/envs/fastai2_me/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    183                 products. Defaults to ``False``.
    184         """
--> 185         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    186 
    187     def register_hook(self, hook):

~/anaconda3/envs/fastai2_me/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    125     Variable._execution_engine.run_backward(
    126         tensors, grad_tensors, retain_graph, create_graph,
--> 127         allow_unreachable=True)  # allow_unreachable flag
    128 
    129 

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

I have the same problem now. Have you found any solution for it?