Evaluation loss depends on batch size

Apparently a known but neglected issue for several years…