Fine-Tuning DeBERTa Produces Non-Results

simonschoe · September 21, 2022, 2:52pm

Hi there,

I am currently working on a binary text classification problem. My current baseline is a RoBERTa model tuned over the following search space:

'parameters': {
        'learning_rate': {
            'values': [5e-4, 1e-4, 5e-5, 3e-5] },
        'per_device_train_batch_size': {
            'values': [8, 16, 32, 64] },
        'num_train_epochs': {
            'values': [3, 4, 5] },
    }

Next, I’d like to test whether I can improve over this baseline using microsoft/deberta-v3-small. The only two lines of code I changed to make it run are:

tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-v3-small', use_fast=True)
model = AutoModelForSequenceClassification.from_pretrained(
    'microsoft/deberta-v3-small', num_labels=2
)

Unfortunately, the DeBERTa model produces non-results for any of the different hyperparameter candidates (i.e., ROC-AUC of .5, F1 of 0), and I just can’t figure out why. Again, switching back to the roberta-base checkpoint everything runs smoothly and the model seems to learn.

Without me reproducing my entire modeling script: Do any differences between the two models come to mind that might necessitate other changes to my codebase except the two above? Grateful for any suggestion!

nbroad · September 21, 2022, 3:54pm

DeBERTa can be really sensitive to learning rate, so I’d recommend trying lower learning rates.

Are you able to share your training script?

simonschoe · September 21, 2022, 5:00pm

@nbroad It appears that this might be indeed the solution. Starting with a learning rate 3e-5 it appears that the model finally produces reasonable outputs! Thanks for the suggestion.

Two quick follow ups:

When loading the DeBERTa fast tokenizer, transformers throws the following warning:

/usr/local/lib/python3.7/dist-packages/transformers/convert_slow_tokenizer.py:447: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
  "The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option"

Is this something I should take into consideration? Put differently: Is there a way to work around it? I had to install sentencepiece to use the tokenizer in the first place.

When running my training loop I also receive the following warnings:

/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:746: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale, dtype=query_layer.dtype
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:829: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  score += c2p_att / torch.tensor(scale, dtype=c2p_att.dtype)
/usr/local/lib/python3.7/dist-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:852: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  score += p2c_att / torch.tensor(scale, dtype=p2c_att.dtype)

Any idea how to circumvent it?

nbroad · September 21, 2022, 6:17pm

Neither of those are problems, just warnings.

Topic		Replies	Views
How to Finetune Deberta Model on SQUAD dataset? 🤗Transformers	2	1164	January 27, 2021
No PreTrainedTokenizerFast for Deberta-V3, no doc_stride 🤗Tokenizers	0	918	July 13, 2022
Metrics of of mdeberta-v3-base training stuck on same level 🤗Transformers	3	782	May 23, 2023
Fine-Tune Xlm-roberta-large-xnli 🤗Transformers	1	1922	December 28, 2021
Getting KeyError: 'logits' when trying to run deberta model Beginners	3	909	July 28, 2022

Fine-Tuning DeBERTa Produces Non-Results

Related topics