How to train controlnet with very large dataset

Hey! I’m currently trying to train a controlnet on sdxl, the problem I have right now is that my dataset (300k+ images), is too large to fit on my machine (256 gb), and I’m also on a multi-gpu machine. What is the best way to work around this if I’m using the script from the diffusers library, I’ve tried many things but struggling to get it to work, if anybody could help that would be amaaaazing!