Hey there,
I’ve tried to train a model of mdeberta-v3-base on Google Colab (Premium GPU). However, there is an issue, that the training does not proceed or the model does not learn. The metrics won’t change for all epochs.
Training of DeBERTa
Training Arguments:
output_dir='./model-results',
num_train_epochs=10,
per_device_train_batch_size=8,
per_device_eval_batch_size=64,
weight_decay=0.01, # defaults to pytorch
logging_dir=os.path.join('logs', 'DeBERTa-training-small'),
logging_steps=100, # defaults to 500
disable_tqdm=False,
evaluation_strategy='epoch',
save_strategy='epoch',
load_best_model_at_end=True,
metric_for_best_model='eval_f1',
save_total_limit=5,
fp16=True,
report_to="wandb",
learning_rate=0.000005 )
Dataset language: german
I’ve trained with the exact same data and parameters XLM-RoBERTa and BERT without any issue.
Training of XLM-RoBERTa
Epoch | Training Loss | Validation Loss | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|---|---|
1 | 0.178500 | 0.109758 | 0.971864 | 0.960648 | 0.933843 | 0.989037 |
2 | 0.065900 | 0.095973 | 0.981794 | 0.974166 | 0.960185 | 0.988561 |
3 | 0.076800 | 0.077241 | 0.986097 | 0.980066 | 0.975898 | 0.984271 |
4 | 0.022800 | 0.080963 | 0.985601 | 0.979505 | 0.968328 | 0.990944 |
5 | 0.044600 | 0.075488 | 0.987918 | 0.982714 | 0.976471 | 0.989037 |
6 | 0.023200 | 0.089165 | 0.985932 | 0.979986 | 0.968357 | 0.991897 |
7 | 0.018500 | 0.097132 | 0.986925 | 0.981407 | 0.969317 | 0.993804 |
8 | 0.029200 | 0.110928 | 0.984111 | 0.977528 | 0.960442 | 0.995234 |
9 | 0.016000 | 0.106296 | 0.986428 | 0.980724 | 0.967532 | 0.994280 |
10 | 0.010700 | 0.104699 | 0.986594 | 0.980955 | 0.967981 | 0.994280 |
Additionally there is an issue that if I increase the batch size to 16, I ran out of CUDA memory (40GB available).
Initialization of Tokenizer:
tokenizer = AutoTokenizer.from_pretrained('microsoft/mdeberta-v3-base', use_fast=True, max_length=1024)
Initialization of Model:
model = AutoModelForSequenceClassification.from_pretrained('microsoft/mdeberta-v3-base', num_labels=2)
Data formatting
[CLS] sequence 1 [SEP] sequence 2 [SEP]
The code is running on an Jupyter Notebook on Google Colab. The assigned task is a binary classification to identify sequence pairs.
Is there any issue with the model or did I make a mistake?
Thanks in advance
Best regards
SacrumDeus