Two different accuracy, one from manual evaluation and one using huggingface’s trainer function. They are completely different. I am shocked
You are calling drop_last
in your eval dataloader, which in general you should never do. Does not doing so fix this?