Fine-Tune Xlm-roberta-large-xnli

Hi everyone, I am working on joeddav/xlm-roberta-large-xnli model and fine-tuning it on turkish language for text classification. (Positive, Negative, Neutral)

My problem is with fine-tuning on a really small dataset (20K finance text) I feel like even training 1 epoch destroys all the weights in model so it doesnt generate any meaningful result after fine-tuning.

Is there a way to regulate the rate of update of the model ?

Here is my model:

#IMPORT MODEL
tokenizer = AutoTokenizer.from_pretrained("joeddav/xlm-roberta-large-xnli")

model = AutoModelForSequenceClassification.from_pretrained("joeddav/xlm-roberta-large-xnli").cuda()

Downloading: 100%
2.24G/2.24G [00:33<00:00, 54.1MB/s]

Some weights of the model checkpoint at joeddav/xlm-roberta-large-xnli were not used when initializing XLMRobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

I’m not sure if this is expected for my fine-tune model or not.

Another problem can be the batch_size since I am using it with batch_size=1 (beacuse the model is huge).

Here is my training arguments:

training_args = TrainingArguments("test_trainer",
                                  num_train_epochs=5,
                                  per_device_train_batch_size=1,
                                  per_device_eval_batch_size=1,
                                  evaluation_strategy="steps",
                                  seed=42,
                                  save_strategy="epoch",
                                  logging_strategy="steps",
                                  logging_steps=500,
                                  eval_steps=500)

Hi!
I might be wrong, but this model was already fine-tuned and it is said that This model is intended to be used for zero-shot text classification.
That is, as far as I understand, you should fine-tune on the base model which is xlm-roberta-large.

Please keep us updated. I am interested in the outcome :pray: