How to prevent too many checkpoints with run_clm.py?

@sgugger @BramVanroy @JacquesThibs How can we implement lazy loading of data in RAM while training the model from scratch?