How does Dataset.from_generator store data bigger than RAM?

It seems like it could be done using the writer_batch_size parameter, but I’m not sure how to use it specifically…

By default, we write data to disk (so it can be memory-mapped) every 1000 rows/samples. You can control this with the writer_batch_size parameter.