Hello everyone,
I trained Bert on the QNLI dataset for 20 epochs and here are the losses I got:
We can see that the training loss is increasing before dropping between each epoch, and I don’t really understand this behaviour, has anyone an idea where it might come from ? Is it normal, or do you think it could come from a problem in my code ?
Also there are “spikes” appearing at the end of each epoch (especially after 8 epochs). I think it comes from the size of my batches. It is quite small, so my last batch contains only 2 samples.
Here are the specification of my training:
- The model I used is “bert-base-cased” which I got pre-trained from the Transformers library, same for the tokenizer.
- I split the training set in a 80/20 ratio to get the validation set.
- I optimized using Adam with a learning rate of 3e-5, nothing else.
- I am evaluating on the validation set 10 times per epoch.
And here is the code I use on each batch for training:
optimizer.zero_grad()
output = model(input_ids, attention_mask=attention_masks, token_type_ids =
token_type_ids, labels=labels)
loss = output.loss
loss.backward()
optimizer.step()
Thank you in advance for your help !