Turns out there was no error! Two things:
-
The learning rate was small and the validation loss was being evaluated very frequently, which explains why the validation loss was so smooth.
-
I needed to run 50 training epochs to see a real difference. That seems odd, but so be it