Fine-tuning multilingual BERT for sequence classification with Trainer API

mu1990 · November 29, 2023, 7:46pm

Hi,

Im fine-tuning multilingual bert for sequence classification as this [CLS] context [SEP] choice [SEP] [PAD] …

Im using the Trainer API

batch_size = 8 # probar con 32
num_train_epochs = 6
logging_steps = len(encoded_datasets['train']) // (2 * batch_size * num_train_epochs)

training_args = TrainingArguments(
    output_dir="results",
    overwrite_output_dir=True,
    num_train_epochs=num_train_epochs,
    learning_rate=0.01, # {5e-5, 3e-5, 2e-5, 0.1}
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    warmup_steps=500,
    weight_decay=0.1, # {0, 0.01, 0.1}
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_steps=logging_steps,
)

def compute_metrics(eval_pred):
    metric = evaluate.load("accuracy")
    logits, labels = eval_pred
    logitsTensors = torch.from_numpy(logits)
    print('eval_pred:', eval_pred)
    print('-'*40)
    print('logits:', logits)
    print('-'*40)
    print('labels:', labels)
    print('-'*40)
    probabilities = torch.softmax(logitsTensors, dim=1)
    predictions = torch.argmax(probabilities, dim=1) # [1, 0, 0, 1...] axis=-1
    print('predictions: ', predictions)
    # return {"accuracy": np.mean(predictions == labels)}
    return metric.compute(predictions=predictions, references=labels)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_datasets['train'], #encoded_datasets['train'],
    eval_dataset=encoded_datasets['validation'], #encoded_datasets['validation'],
    #data_collator=data_collator,
    tokenizer=tokenizer,
    # id2label=id2label,
    # label2id=label2id,
    compute_metrics=compute_metrics
)

I have experimented with different parameters but I almost always get 0.5 in accuracy and my training and validation loss stays almost the same.

What could this mean?

could you recommend me some guideline?

Thank you!

jeevisha30 · December 1, 2023, 6:14pm

Can you share your training script?

mu1990 · December 1, 2023, 7:44pm

Hi, thank you for your answer.

I don’t have any training script, I was hoping to train the model through Trainer as shown, without the Pytorch training loop. Is this approach correct?

accuracy always stuck in 50%, what could that mean?

jeevisha30 · December 5, 2023, 10:56pm

can you try this and see what is your accuracy coming out to be?

    logits, labels = p
    logits = logits.tolist()
    labels = labels.tolist()
    pred = np.argmax(logits, axis=1).tolist()
    
    logits_tensor = torch.tensor(logits)
    prob = torch.nn.functional.softmax(logits_tensor, dim=-1).tolist()
    accuracy = accuracy_score(y_true=labels, y_pred=pred)

mu1990 · December 7, 2023, 4:57pm

hi, thank you for your help.

I ran the experiment for 6 epochs, these are the results:

{‘eval_loss’: 0.6945369243621826, ‘eval_accuracy’: 0.4765625, ‘eval_runtime’: 1.1837, ‘eval_samples_per_second’: 108.136, ‘eval_steps_per_second’: 13.517, ‘epoch’: 1.0}

{‘eval_loss’: 0.6935781836509705, ‘eval_accuracy’: 0.5078125, ‘eval_runtime’: 1.1774, ‘eval_samples_per_second’: 108.718, ‘eval_steps_per_second’: 13.59, ‘epoch’: 2.0}

{‘eval_loss’: 0.6960821747779846, ‘eval_accuracy’: 0.5, ‘eval_runtime’: 1.1887, ‘eval_samples_per_second’: 107.677, ‘eval_steps_per_second’: 13.46, ‘epoch’: 3.0}

{‘eval_loss’: 0.6931933760643005, ‘eval_accuracy’: 0.5, ‘eval_runtime’: 1.1842, ‘eval_samples_per_second’: 108.091, ‘eval_steps_per_second’: 13.511, ‘epoch’: 4.0}

{‘eval_loss’: 0.6936441659927368, ‘eval_accuracy’: 0.484375, ‘eval_runtime’: 1.183, ‘eval_samples_per_second’: 108.197, ‘eval_steps_per_second’: 13.525, ‘epoch’: 5.0}

{‘eval_loss’: 0.6934036016464233, ‘eval_accuracy’: 0.484375, ‘eval_runtime’: 1.1699, ‘eval_samples_per_second’: 109.411, ‘eval_steps_per_second’: 13.676, ‘epoch’: 6.0}

{‘train_runtime’: 217.6431, ‘train_samples_per_second’: 31.758, ‘train_steps_per_second’: 3.97, ‘train_loss’: 0.7024143382355019, ‘epoch’: 6.0}

There is no any change in evaluation loss, and accuracy is stuck. I also notice that all logits are negatives as this (why is that):

logits_tensor: tensor([[-0.0935, -0.1828],
[-0.1098, -0.1829],
[-0.0728, -0.1879],
[-0.0940, -0.1868],
[-0.2301, -0.2504],
[-0.2266, -0.2678],
[-0.2459, -0.2656],
[-0.2376, -0.2704],
[-0.1885, -0.0431],
[-0.2165, -0.0214],
…

I think my validation partition is wrong formulated. I believe I have to transform back to the original format and maps the highest logit to the corresponding label which would be the prediction.

jeevisha30 · December 8, 2023, 4:32pm

Maybe your data quality is bad. i keep getting 40% accuracy also so i improved my data now i am at 60%. how big is your training data? You will have to do an error analysis, make confusion matrix and start with the maximum classes being confused with each other.

mu1990 · December 8, 2023, 5:47pm

Hi there,

Im fine-tuning a spanish version of bert with 1152 instances. I changed the training now with native Pytorch. I will explore in deep what you are telling me. I also believe i’m doing the evaluation wrong.

I am evaluating sentences for two options/candidates as a binary classification task, where my classes are 0 and 1.

My sentences transform in two with every option. I think I need to transform the sentences back to the original with the predicted label of the model, and evaluate in that validation set.

I do appreciate your advise

mu1990 · December 12, 2023, 12:53am

mister, can we talk about our experiments and share knowledges about LLMs and programming?
this is my email: musta.ali.saba @ g mail . com

Topic		Replies	Views
Evaluating Finetuned BERT Model for Sequence Classification Beginners	10	8482	October 25, 2022
Train test split (70/20/10(test)) & evaluation using Trainer Beginners	0	1015	August 25, 2023
Log training accuracy using Trainer class Beginners	1	677	December 19, 2021
How to use the model from the chapter "Fine-tuning a model with the Trainer API" Course	0	322	April 17, 2024
Fine tuning an unsupervised model - BERT Beginners	0	857	April 7, 2022

Fine-tuning multilingual BERT for sequence classification with Trainer API

Related topics