Hi All, I have a dataset which contains about 100,000 email messages that have been labelled with one of 300 labels or so that I am training the model on for the purpose of automated email classification.
During training with the pre-shuffled dataset split 80/20 train/test respectively, my eval accuracy never really gets very good - even after 10-15 epochs (with each epoch taking about 1 hour to train).
I am using the AutoModelForSequenceClassification pre-trained with the distilbert-base-uncased model with the training arguments below:
train_batch_size = 32
eval_batch_size = 8
num_train_epochs = 24
training_args = TrainingArguments(
output_dir=“my_awesome_model_BASE_5e-5”,
overwrite_output_dir=False,
per_device_train_batch_size=train_batch_size,
per_device_eval_batch_size=eval_batch_size,
num_train_epochs=num_train_epochs,
learning_rate=5e-5,
weight_decay=0.01,
evaluation_strategy=“epoch”,
save_strategy=“epoch”,
load_best_model_at_end=True,
push_to_hub=False,
)
Anything i can do with improving the training accuracy? Have tried lower learning rates, adjusting the test/train batch sizes and number of epochs and just doesnt seem to be really getting anywhere.
Appreciate any help!