Data type error while trying to fine tune Deberta v3 Large

NDugar · November 11, 2021, 5:34pm

I was trying this - microsoft/deberta-v3-large · Hugging Face
Command ran :-

python3 run_glue.py  --model_name_or_path microsoft/deberta-v3-large --task_name mnli   --do_train   --do_eval   --evaluation_strategy steps   --max_seq_length 256   --warmup_steps 50   --learning_rate 6e-5   --num_train_epochs 3   --output_dir outputv3 --overwrite_output_dir   --logging_steps 10000   --logging_dir outputv3/

while in this folder-- /transformers/examples/pytorch/text-classification$

for the fuller sequence of issues that occured and got solved- [quote=“nielsr, post:9, topic:11486, full:true”]
I’ve created a notebook for you: Google Colab
[/quote]

Thanks for reading.

SaulLu · November 15, 2021, 3:25pm

Hi @NDugar,

You get the error below because the dataset you use for fine-tuning does not have a validation split. As you can see on the imdb dataset card this dataset has only the train, unsupervised and test splits available natively. You can either modify the dataset yourself to extract your own evaluation dataset or use the test split by replacing --do_eval with --do_predict in your script.

I hope this helped you!

Error currently discussed:

Traceback (most recent call last):
  File "run_glue.py", line 568, in <module>
    main()
  File "run_glue.py", line 422, in main
    raise ValueError("--do_eval requires a validation dataset")
ValueError: --do_eval requires a validation dataset

NDugar · November 15, 2021, 3:59pm

In the command I ran the problem seems to be regarding data types and I am trying to figure out what do I change in it. Command is written below:-

python3 run_glue.py  --model_name_or_path microsoft/deberta-v3-large --task_name mnli   --do_train   --do_eval   --evaluation_strategy steps   --max_seq_length 256   --warmup_steps 50   --learning_rate 6e-5   --num_train_epochs 3   --output_dir outputv3 --overwrite_output_dir   --logging_steps 10000   --logging_dir outputv3/

SaulLu · November 15, 2021, 4:06pm

@NDugar , I’m a little confused. The command you just shared with me does not match the one in the google colab you shared in your initial post.

If I understand correctly, I should ignore your notebook colab? The command you executed is :

python3 run_glue.py \ 
    --model_name_or_path microsoft/deberta-v3-large \
    --task_name mnli \
    --do_train \
    --do_eval \ 
    --evaluation_strategy steps \
    --max_seq_length 256 \
    --warmup_steps 50 \
    --learning_rate 6e-5 \
    --num_train_epochs 3 \
    --output_dir outputv3 \
    --overwrite_output_dir \
    --logging_steps 10000 \
    --logging_dir outputv3/

and the error you get is:

inputGrad = _softmax_backward_data(grad_output, output, self.dim, output)
TypeError: _softmax_backward_data(): argument 'input_dtype' (position 4) must be torch.dtype, not Tensor
0%|

NDugar · November 15, 2021, 4:14pm

yes correct. sorry for the confusion.

SaulLu · November 15, 2021, 4:57pm

I couldn’t reproduce the error you having with microsoft/deberta-v3-small (because the large version doesn’t fit in google colab).

Can you confirm that it is the same problem as this issue?

If so, do you know how long ago you cloned the transformers library for your test? There have been changes to the file transformers/models/deberta_v2/modeling_deberta_v2.py" 17 days ago that might have solved your problem.

SaulLu · November 15, 2021, 5:01pm

No problem

NDugar · November 15, 2021, 5:10pm

yes, same issue. I cloned 5 days back so I think I am on the latest version.

SaulLu · November 15, 2021, 8:06pm

Thank you for the additional information.

As I can’t reproduce errors with the small version, I would have to try with the large version which I won’t have time to do today. I must admit that I have no idea where it could come from. I’ll get back to you as soon as possible.

NDugar · November 15, 2021, 8:12pm

Thank you. I was facing the same issue with deberta v2. so I don’t think the problem lies with the model but rather how they both were made.

SaulLu · November 17, 2021, 5:41pm

I just tested with microsoft/deberta-v3-large and I don’t get any error : the training starts well for me.

I just added 2 more arguments --per_device_train_batch_size 2 --per_device_eval_batch_size 2 because otherwise I didn’t have enough VRAM.

I don’t have exactly the same setting as you, though. Would it be possible for you to try again with the same settings as me (first trying with the last transformers version on master, then with Python version: 3.8.10 and finally with PyTorch 1.9.0)?

My setting:

- `transformers` version: 4.13.0.dev0
- Python version: 3.8.10
- PyTorch version (GPU?): 1.9.0+cu102 (True)

Can’t wait to see how it turns out on your end!

NDugar · November 17, 2021, 5:44pm

ok will try this and reply.

NDugar · November 18, 2021, 4:12pm

I have the same transformers version. but I am on python 3.6.9, and torch - 1.10.0+cu102.

NDugar · November 19, 2021, 8:16am

I realised that my issue is that of CUDA. So I am closing this. Thank you for your help.

Topic		Replies	Views
Getting error while fine tuning Deberta v3 Large Models	12	6321	November 10, 2021
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['label'] Intermediate	3	18075	February 4, 2025
`run_glue.py` with my own dataset of one-sentence input 🤗Transformers	6	7397	July 18, 2021
Error when fine-tuning imdb with the script 🤗Transformers	1	1097	October 29, 2021
Error while finding module specification for 'run_glue.py' Amazon SageMaker	7	5246	November 18, 2021

Data type error while trying to fine tune Deberta v3 Large

Related topics