Understanding set_transform

jncasey · February 20, 2021, 4:16pm

Great, thanks for confirming that it doesn’t cache to disk. That’s exactly what I was hoping.

I guess now I’ll have to update to the latest masters and start testing how much on-the-fly tokenization and other data transforms slow my training down.

Topic		Replies	Views
Using load_dataset.set_transform() function along with Trainer class 🤗Datasets	4	2616	April 26, 2021
How to use set_transform when map becomes unfeasible? Intermediate	2	135	June 19, 2024
Pipeline with custom dataset tokenizer: when to save/load manually 🤗Datasets	18	5636	September 18, 2020
Transformed dataset to_json saves cache dataset Beginners	3	370	January 3, 2023
Set batch instead of full train dataset on Trainer 🤗Transformers	1	372	March 11, 2024

Understanding set_transform

Related topics