As you say, I don’t want to recommend it either. The slowdown and difficulty of implementation are unavoidable overheads, and this is a last resort. However, your dataset is probably too large even in an environment with a lot of RAM…
There may be a way to train it by slicing the dataset itself in advance simply and feeding it in small amounts without streaming.