Metrics of of mdeberta-v3-base training stuck on same level

SacrumDeus · February 22, 2023, 4:39pm

Hey there,

I’ve tried to train a model of mdeberta-v3-base on Google Colab (Premium GPU). However, there is an issue, that the training does not proceed or the model does not learn. The metrics won’t change for all epochs.

Training of DeBERTa

Training Arguments:

output_dir='./model-results',
num_train_epochs=10,
per_device_train_batch_size=8,
per_device_eval_batch_size=64,
weight_decay=0.01,                   # defaults to pytorch
logging_dir=os.path.join('logs', 'DeBERTa-training-small'),
logging_steps=100,                   # defaults to 500
disable_tqdm=False,
evaluation_strategy='epoch',
save_strategy='epoch',
load_best_model_at_end=True,
metric_for_best_model='eval_f1',
save_total_limit=5,
fp16=True,
report_to="wandb",
learning_rate=0.000005 )

Dataset language: german

I’ve trained with the exact same data and parameters XLM-RoBERTa and BERT without any issue.

Training of XLM-RoBERTa

Epoch	Training Loss	Validation Loss	Accuracy	F1	Precision	Recall
1	0.178500	0.109758	0.971864	0.960648	0.933843	0.989037
2	0.065900	0.095973	0.981794	0.974166	0.960185	0.988561
3	0.076800	0.077241	0.986097	0.980066	0.975898	0.984271
4	0.022800	0.080963	0.985601	0.979505	0.968328	0.990944
5	0.044600	0.075488	0.987918	0.982714	0.976471	0.989037
6	0.023200	0.089165	0.985932	0.979986	0.968357	0.991897
7	0.018500	0.097132	0.986925	0.981407	0.969317	0.993804
8	0.029200	0.110928	0.984111	0.977528	0.960442	0.995234
9	0.016000	0.106296	0.986428	0.980724	0.967532	0.994280
10	0.010700	0.104699	0.986594	0.980955	0.967981	0.994280

Additionally there is an issue that if I increase the batch size to 16, I ran out of CUDA memory (40GB available).

Initialization of Tokenizer:

tokenizer = AutoTokenizer.from_pretrained('microsoft/mdeberta-v3-base', use_fast=True, max_length=1024)

Initialization of Model:

model = AutoModelForSequenceClassification.from_pretrained('microsoft/mdeberta-v3-base', num_labels=2)

Data formatting

[CLS] sequence 1 [SEP] sequence 2 [SEP]

The code is running on an Jupyter Notebook on Google Colab. The assigned task is a binary classification to identify sequence pairs.

Is there any issue with the model or did I make a mistake?

Thanks in advance

Best regards
SacrumDeus

SacrumDeus · February 26, 2023, 10:57am

The training with the model microsoft/deberta-v3-base and the exact same parameters worked. It performed pretty well, although the model microsoft/deberta-v3-base was not pretrained on German data

bingyinh · May 23, 2023, 12:59am

I ran into the same issue. Turns out mdeberta does not support fp16 training yet as described in this issue. Simply turn off fp16 in your training argument should fix the issue.

SacrumDeus · May 23, 2023, 8:47am

Hey @bingyinh

thanks for your reply. Same happend to me with the model T5.

Since I used microsoft/deberta-v3-base as model for training, the error does not occur.
I will mark it as respolved

Best regards
SacrumDeus

Topic		Replies	Views
Fine-Tuning DeBERTa Produces Non-Results 🤗Transformers	3	3056	September 21, 2022
Xlm-roberta-base predicting always same class, other models don't Intermediate	2	1101	June 7, 2023
I'm making ROBERTA dumber, and I don't know why Beginners	1	341	March 8, 2021
Unable to Finetune Deberta Intermediate	0	369	October 26, 2022
Constant output predictions on test data 🤗Transformers	0	508	September 29, 2022

Metrics of of mdeberta-v3-base training stuck on same level

Related topics