Fine-Tune Xlm-roberta-large-xnli

Ege · June 28, 2021, 10:06am

Hi everyone, I am working on joeddav/xlm-roberta-large-xnli model and fine-tuning it on turkish language for text classification. (Positive, Negative, Neutral)

My problem is with fine-tuning on a really small dataset (20K finance text) I feel like even training 1 epoch destroys all the weights in model so it doesnt generate any meaningful result after fine-tuning.

Is there a way to regulate the rate of update of the model ?

Here is my model:

#IMPORT MODEL
tokenizer = AutoTokenizer.from_pretrained("joeddav/xlm-roberta-large-xnli")

model = AutoModelForSequenceClassification.from_pretrained("joeddav/xlm-roberta-large-xnli").cuda()

Downloading: 100%
2.24G/2.24G [00:33<00:00, 54.1MB/s]

Some weights of the model checkpoint at joeddav/xlm-roberta-large-xnli were not used when initializing XLMRobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

I’m not sure if this is expected for my fine-tune model or not.

Another problem can be the batch_size since I am using it with batch_size=1 (beacuse the model is huge).

Here is my training arguments:

training_args = TrainingArguments("test_trainer",
                                  num_train_epochs=5,
                                  per_device_train_batch_size=1,
                                  per_device_eval_batch_size=1,
                                  evaluation_strategy="steps",
                                  seed=42,
                                  save_strategy="epoch",
                                  logging_strategy="steps",
                                  logging_steps=500,
                                  eval_steps=500)

murat · December 28, 2021, 3:04pm

Hi!
I might be wrong, but this model was already fine-tuned and it is said that This model is intended to be used for zero-shot text classification.
That is, as far as I understand, you should fine-tune on the base model which is xlm-roberta-large.

Please keep us updated. I am interested in the outcome

Topic		Replies	Views
Cannot replicate xlm-roberta-large-xnli Results Models	0	496	September 2, 2021
Does anyone else observer RoBERTa fine-tuning instability? 🤗Transformers	8	3117	April 20, 2023
Xlm-roberta-base predicting always same class, other models don't Intermediate	2	1102	June 7, 2023
Can't reproduce xlm-roberta-large finetuned result on XNLI 🤗Transformers	2	1918	March 10, 2021
Bug: Finetune XLM-RoBERTa-large on XNLI get 0.33 in accuracy while XLM-RoBERTa-base works fine 🤗Transformers	0	343	March 23, 2022

Fine-Tune Xlm-roberta-large-xnli

Related topics