Hello,
i have trained different BERT models to evaluate which of these models perform best with my dataset.
At the moment I got the problem that I always get different results when running the model again (e.g. run a new colab session).
For a reproducible evaluation I want to get always the same results (e.g. F1) but I don’t know how to do this.
I have ensured that my train/test/validation split is always the same.
Here is an example notebook: Google Colab
Thanks in advance!