TransformerXL run_clm.py grad can be implicitly created only for scalar outputs

Hello,
I am trying to execute run_clm.py for TransformerXL (transfo-xl-wt103) but get the following error:

0% 0/10170 [00:00<?, ?it/s]/usr/local/lib/python3.6/dist-packages/transformers/modeling_transfo_xl.py:445: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
indices_i = mask_i.nonzero().squeeze()
Traceback (most recent call last):
File “language-modeling/run_clm.py”, line 352, in
main()
File “language-modeling/run_clm.py”, line 321, in main
model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 775, in train
tr_loss += self.training_step(model, inputs)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 1126, in training_step
loss.backward()
File “/usr/local/lib/python3.6/dist-packages/torch/tensor.py”, line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py”, line 126, in backward
grad_tensors_ = make_grads(tensors, grad_tensors)
File “/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py”, line 50, in _make_grads
raise RuntimeError(“grad can be implicitly created only for scalar outputs”)
RuntimeError: grad can be implicitly created only for scalar outputs
0% 0/10170 [00:00<?, ?it/s]

I haven’t changed the original (run_clm.py) code. I am using:

!python language-modeling/run_clm.py
–output_dir=‘/content/drive/My Drive/XL-result’
–model_type=transfo-xl-wt103
–model_name_or_path=transfo-xl-wt103
–save_total_limit=2
–num_train_epochs=2
–do_train
–train_file=‘/content/drive/My Drive/train.txt’
–do_eval
–validation_file=‘/content/drive/My Drive/test.txt’
–per_device_train_batch_size=4
–per_device_train_batch_size=4
–learning_rate 5e-5
–seed 42
–overwrite_output_dir
–block_size 125

Any help would be much appreciated!

Have you read this?

If you haven’t changed the run_clm code, something else must be different. What versions of python, pytorch and huggingface_transformers are you using?

Are you using CPU/GPU/TPU ? Are you using DataParallel (whatever that is)?

@lcrivell vell are you able to solve this problem? I am also getting this error while finetuning the bert-base-cased on the mnli dataset.

@rgwatwormhill how to use this in the trainer.