Hi everyone, I am working on joeddav/xlm-roberta-large-xnli model and fine-tuning it on turkish language for text classification. (Positive, Negative, Neutral)
My problem is with fine-tuning on a really small dataset (20K finance text) I feel like even training 1 epoch destroys all the weights in model so it doesnt generate any meaningful result after fine-tuning.
Is there a way to regulate the rate of update of the model ?
Here is my model:
#IMPORT MODEL
tokenizer = AutoTokenizer.from_pretrained("joeddav/xlm-roberta-large-xnli")
model = AutoModelForSequenceClassification.from_pretrained("joeddav/xlm-roberta-large-xnli").cuda()
Downloading: 100%
2.24G/2.24G [00:33<00:00, 54.1MB/s]
Some weights of the model checkpoint at joeddav/xlm-roberta-large-xnli were not used when initializing XLMRobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
I’m not sure if this is expected for my fine-tune model or not.
Another problem can be the batch_size
since I am using it with batch_size=1
(beacuse the model is huge).
Here is my training arguments:
training_args = TrainingArguments("test_trainer",
num_train_epochs=5,
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
evaluation_strategy="steps",
seed=42,
save_strategy="epoch",
logging_strategy="steps",
logging_steps=500,
eval_steps=500)