Space Connection Error - Dataset Map

How can i do to load “tokenized dataset” from the dataset repos?

I’ve found a way to save it. However, I think it would be more reliable to split the data set into several parts and run the training in several sessions if possible. It would be difficult if the data cannot be divided due to its nature…

Another option would be to customize the Trainer’s DataCollator to incorporate a tokenizer. And use trainer with IterableDataset.