Adafactor step() give me "tensors must be 2-D" error

Hi, I want to try Adafactor optimizer but when I use optimizer.step() , I got an error like below.

I instantiate Adafactor optimizer like below.

optimizer = Adafactor(model.parameters(), scale_parameter=True, relative_step=True, warmup_init=True, lr=None)

And when I execute optimizer.step() , I got error like below.

Traceback (most recent call last):
  File "", line 371, in <module>
  File "~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/optim/", line 88, in wrapper
    return func(*args, **kwargs)
  File "~/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/", line 584, in step
    update = self._approx_sq_grad(exp_avg_sq_row, exp_avg_sq_col)
  File "~/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/", line 515, in _approx_sq_grad
    return, c_factor.unsqueeze(0))
RuntimeError: tensors must be 2-D

The shape of r_factor and c_factor is like below.

r_factor:  torch.Size([512, 1])
c_factor:  torch.Size([512, 10])
r_factor.unsqueeze(-1):  torch.Size([512, 1, 1])
c_factor.unsqueeze(0):  torch.Size([1, 512, 10])

It looks actually my tensors are not 2-D tensors.
But I don’t know where things went wrong.

Is this error something to do with this warning?

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/ UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)

Thanks in advance.