@sgugger How “smart” is this feature? I remember that in openNMT when you specify a max checkpoint this does not take into account the best evaluated checkpoint up to that point. In other words, if there is an old checkpoint that is the best, it can still get deleted. That should not be the case, so I am wondering whether the implementation in the transformers trainer is a bit smarter?