Measure statistical significance betweetn

Hello,

I have trained several BERT models and now want to have a look if the results from one model are significantly better than the results from another model.

What would be an approach to estimate this?

Thanks!