Data type error while trying to fine tune Deberta v3 Large

image
I was trying this - microsoft/deberta-v3-large · Hugging Face
Command ran :-

python3 run_glue.py  --model_name_or_path microsoft/deberta-v3-large --task_name mnli   --do_train   --do_eval   --evaluation_strategy steps   --max_seq_length 256   --warmup_steps 50   --learning_rate 6e-5   --num_train_epochs 3   --output_dir outputv3 --overwrite_output_dir   --logging_steps 10000   --logging_dir outputv3/

while in this folder-- /transformers/examples/pytorch/text-classification$

for the fuller sequence of issues that occured and got solved- [quote=“nielsr, post:9, topic:11486, full:true”]
I’ve created a notebook for you: Google Colab
[/quote]

Thanks for reading.

Hi @NDugar,

You get the error below because the dataset you use for fine-tuning does not have a validation split. As you can see on the imdb dataset card this dataset has only the train, unsupervised and test splits available natively. You can either modify the dataset yourself to extract your own evaluation dataset or use the test split by replacing --do_eval with --do_predict in your script.

I hope this helped you!

Error currently discussed:

Traceback (most recent call last):
  File "run_glue.py", line 568, in <module>
    main()
  File "run_glue.py", line 422, in main
    raise ValueError("--do_eval requires a validation dataset")
ValueError: --do_eval requires a validation dataset

In the command I ran the problem seems to be regarding data types and I am trying to figure out what do I change in it. Command is written below:-

python3 run_glue.py  --model_name_or_path microsoft/deberta-v3-large --task_name mnli   --do_train   --do_eval   --evaluation_strategy steps   --max_seq_length 256   --warmup_steps 50   --learning_rate 6e-5   --num_train_epochs 3   --output_dir outputv3 --overwrite_output_dir   --logging_steps 10000   --logging_dir outputv3/

@NDugar , I’m a little confused. The command you just shared with me does not match the one in the google colab you shared in your initial post.

If I understand correctly, I should ignore your notebook colab? The command you executed is :

python3 run_glue.py \ 
    --model_name_or_path microsoft/deberta-v3-large \
    --task_name mnli \
    --do_train \
    --do_eval \ 
    --evaluation_strategy steps \
    --max_seq_length 256 \
    --warmup_steps 50 \
    --learning_rate 6e-5 \
    --num_train_epochs 3 \
    --output_dir outputv3 \
    --overwrite_output_dir \
    --logging_steps 10000 \
    --logging_dir outputv3/

and the error you get is:

inputGrad = _softmax_backward_data(grad_output, output, self.dim, output)
TypeError: _softmax_backward_data(): argument 'input_dtype' (position 4) must be torch.dtype, not Tensor
0%|

yes correct. sorry for the confusion.

I couldn’t reproduce the error you having with microsoft/deberta-v3-small (because the large version doesn’t fit in google colab).

Can you confirm that it is the same problem as this issue?

If so, do you know how long ago you cloned the transformers library for your test? There have been changes to the file transformers/models/deberta_v2/modeling_deberta_v2.py" 17 days ago that might have solved your problem.

1 Like

No problem :wink:

1 Like

yes, same issue. I cloned 5 days back so I think I am on the latest version.

Thank you for the additional information.

As I can’t reproduce errors with the small version, I would have to try with the large version which I won’t have time to do today. I must admit that I have no idea where it could come from. I’ll get back to you as soon as possible.

1 Like

Thank you. I was facing the same issue with deberta v2. so I don’t think the problem lies with the model but rather how they both were made.

I just tested with microsoft/deberta-v3-large and I don’t get any error :worried:: the training starts well for me.

I just added 2 more arguments --per_device_train_batch_size 2 --per_device_eval_batch_size 2 because otherwise I didn’t have enough VRAM.

I don’t have exactly the same setting as you, though. Would it be possible for you to try again with the same settings as me (first trying with the last transformers version on master, then with Python version: 3.8.10 and finally with PyTorch 1.9.0)?

My setting:

- `transformers` version: 4.13.0.dev0
- Python version: 3.8.10
- PyTorch version (GPU?): 1.9.0+cu102 (True)

Can’t wait to see how it turns out on your end!

ok will try this and reply.

1 Like

I have the same transformers version. but I am on python 3.6.9, and torch - 1.10.0+cu102.

I realised that my issue is that of CUDA. So I am closing this. Thank you for your help.

1 Like