Fine-Tuning results suggest some underlying implementation error?

Turns out there was no error! Two things:

  1. The learning rate was small and the validation loss was being evaluated very frequently, which explains why the validation loss was so smooth.

  2. I needed to run 50 training epochs to see a real difference. That seems odd, but so be it :man_shrugging: