Metric while training and after one are different

I train model via transformers Trainer.
metrics log:

`StepTraining LossValidation LossF1`
`1000.2453000` `.1961280.925541`
`2000.1796000` `.3073770.900455`
`3000.1672000` `.2677220.903617`
`4000.0719000` `.1359580.959697`
`5000.1298000` `.2042290.948185`
`6000.0349000` `.2202260.943453`
`7000.0753000` `.2094510.941723`
`8000.0617000` `.1527000.946293`
`9000.0511000` `.1243610.959392`
`10000.0469000` `.1567110.959658`
`11000.0770000` `.1678610.955440`
`12000.0807000` `.1479730.971185`
`13000.0389000` `.1390290.962280`
`14000.0441000` `.1545020.967336`

After training the best model was loaded, from log:

Loading best model from ./results_swin/checkpoint-1200 (score: 0.9711846590298694).

I make predict on validation part of the dataset, which I did use while training as eval_dataset:

val_preds = trainer.predict(dataset['test'])
val_preds.metrics['test_f1']

And metric is so small:

0.957392790742415

Why did that happened?