Statistical significance between BERT models


for an university research project I am trying out some baseline models and try to improve these baseline results with specific approaches.
For this I am doing a 5-Cross Validation with the dataset. Then I am training for example my Baseline models for 5 times and also my specific approach for 5 times. After this I want to do a T-Test to see some statistically difference in performance.

As this is very expensive in terms of time I wanted to ask if there is another way to see if a model performes significantally better than another model?

Thanks in advance!