AlbertForMaskedLM error- "view size is not compatible..."

morgan · September 25, 2020, 10:40pm

I’m getting the following error with AlbertForMaskedLM:

view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Its triggered during the backward pass and i thought it was due to the .view used in the loss calculation.

A fix is mentioned in this issue: https://github.com/huggingface/transformers/issues/4406

to add .contiguous() before view, or else replace view with .reshape

I’ve tried both (.view is used in the loss calculation of AlbertForMaskedLM and also in AlbertAttention), but I’m still getting the error

It doesn’t occur immediately, but a little into the warmup phase (356 batches in, batch size = 8)

Any help would be really appreciated, not sure what else I can do, maybe there is another .view hiding somewhere…

Using:

pytorch 1.6
transformers 3.1, version on master

full trace:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-36-3435b262f1ae> in <module>
----> 1 trainer.train()

~/ml/projects/consulting/pretrain_bert/transformers/src/transformers/trainer.py in train(self, model_path, trial)
    741                     continue
    742 
--> 743                 tr_loss += self.training_step(model, inputs)
    744                 self.total_flos += self.floating_point_ops(inputs)
    745 

~/ml/projects/consulting/pretrain_bert/transformers/src/transformers/trainer.py in training_step(self, model, inputs)
   1058 
   1059         if self.args.fp16 and _use_native_amp:
-> 1060             self.scaler.scale(loss).backward()
   1061         elif self.args.fp16 and _use_apex:
   1062             with amp.scale_loss(loss, self.optimizer) as scaled_loss:

~/anaconda3/envs/fastai2_me/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    183                 products. Defaults to ``False``.
    184         """
--> 185         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    186 
    187     def register_hook(self, hook):

~/anaconda3/envs/fastai2_me/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    125     Variable._execution_engine.run_backward(
    126         tensors, grad_tensors, retain_graph, create_graph,
--> 127         allow_unreachable=True)  # allow_unreachable flag
    128 
    129 

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

SudoSaba · June 22, 2023, 12:05am

I have the same problem now. Have you found any solution for it?

Topic		Replies	Views
Trainer error for "albert-base-v2" due to batch size mismatch 🤗Transformers	1	742	April 11, 2023
Solving error for mismatch tensor size 🤗Transformers	0	312	April 14, 2024
Target size (torch.size([16])) must be the same as input size (torch.size([16, 9])) Beginners	1	1005	December 20, 2022
Falcon-7b sharded model - RuntimeError: view size is not compatible with input tensor's size and stride Models	0	334	October 7, 2023
Albert Pre-training with Batch size 8 is throwing OOM 🤗Transformers	0	369	January 12, 2022

AlbertForMaskedLM error- "view size is not compatible..."

Related topics