Hi, I want to try Adafactor optimizer but when I use optimizer.step() , I got an error like below.
I instantiate Adafactor optimizer like below.
optimizer = Adafactor(model.parameters(), scale_parameter=True, relative_step=True, warmup_init=True, lr=None)
And when I execute optimizer.step()
, I got error like below.
Traceback (most recent call last):
File "pretrain_v2.py", line 371, in <module>
optimizer.step()
File "~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "~/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/optimization.py", line 584, in step
update = self._approx_sq_grad(exp_avg_sq_row, exp_avg_sq_col)
File "~/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/optimization.py", line 515, in _approx_sq_grad
return torch.mm(r_factor.unsqueeze(-1), c_factor.unsqueeze(0))
RuntimeError: tensors must be 2-D
The shape of r_factor and c_factor is like below.
r_factor: torch.Size([512, 1])
c_factor: torch.Size([512, 10])
r_factor.unsqueeze(-1): torch.Size([512, 1, 1])
c_factor.unsqueeze(0): torch.Size([1, 512, 10])
It looks actually my tensors are not 2-D tensors.
But I don’t know where things went wrong.
Is this error something to do with this warning?
~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
Thanks in advance.