Trainer doesn't call compute_metrics during evaluation

No sadly I had to write my own evaluation methods.