Shockingly Incorrect Evaluate Function in Huggingface API

Two different accuracy, one from manual evaluation and one using huggingface’s trainer function. They are completely different. I am shocked

You are calling drop_last in your eval dataloader, which in general you should never do. Does not doing so fix this? :slight_smile: