Metrics of of mdeberta-v3-base training stuck on same level

Hey there,

I’ve tried to train a model of mdeberta-v3-base on Google Colab (Premium GPU). However, there is an issue, that the training does not proceed or the model does not learn. The metrics won’t change for all epochs.

Training of DeBERTa
image

Training Arguments:

output_dir='./model-results',
num_train_epochs=10,
per_device_train_batch_size=8,
per_device_eval_batch_size=64,
weight_decay=0.01,                   # defaults to pytorch
logging_dir=os.path.join('logs', 'DeBERTa-training-small'),
logging_steps=100,                   # defaults to 500
disable_tqdm=False,
evaluation_strategy='epoch',
save_strategy='epoch',
load_best_model_at_end=True,
metric_for_best_model='eval_f1',
save_total_limit=5,
fp16=True,
report_to="wandb",
learning_rate=0.000005 )

Dataset language: german

I’ve trained with the exact same data and parameters XLM-RoBERTa and BERT without any issue.

Training of XLM-RoBERTa

Epoch Training Loss Validation Loss Accuracy F1 Precision Recall
1 0.178500 0.109758 0.971864 0.960648 0.933843 0.989037
2 0.065900 0.095973 0.981794 0.974166 0.960185 0.988561
3 0.076800 0.077241 0.986097 0.980066 0.975898 0.984271
4 0.022800 0.080963 0.985601 0.979505 0.968328 0.990944
5 0.044600 0.075488 0.987918 0.982714 0.976471 0.989037
6 0.023200 0.089165 0.985932 0.979986 0.968357 0.991897
7 0.018500 0.097132 0.986925 0.981407 0.969317 0.993804
8 0.029200 0.110928 0.984111 0.977528 0.960442 0.995234
9 0.016000 0.106296 0.986428 0.980724 0.967532 0.994280
10 0.010700 0.104699 0.986594 0.980955 0.967981 0.994280

Additionally there is an issue that if I increase the batch size to 16, I ran out of CUDA memory (40GB available).

Initialization of Tokenizer:

tokenizer = AutoTokenizer.from_pretrained('microsoft/mdeberta-v3-base', use_fast=True, max_length=1024)

Initialization of Model:

model = AutoModelForSequenceClassification.from_pretrained('microsoft/mdeberta-v3-base', num_labels=2)

Data formatting

[CLS] sequence 1 [SEP] sequence 2 [SEP]

The code is running on an Jupyter Notebook on Google Colab. The assigned task is a binary classification to identify sequence pairs.

Is there any issue with the model or did I make a mistake?

Thanks in advance

Best regards
SacrumDeus

The training with the model microsoft/deberta-v3-base and the exact same parameters worked. It performed pretty well, although the model microsoft/deberta-v3-base was not pretrained on German data

I ran into the same issue. Turns out mdeberta does not support fp16 training yet as described in this issue. Simply turn off fp16 in your training argument should fix the issue.

Hey @bingyinh

thanks for your reply. Same happend to me with the model T5.

Since I used microsoft/deberta-v3-base as model for training, the error does not occur.
I will mark it as respolved :wink:

Best regards
SacrumDeus