Pipeline with custom dataset tokenizer: when to save/load manually

thomwolf · September 11, 2020, 9:01pm

The idea is that you can write a simple and readable code once and not care that it is redoing the downloading/pre-processing operations when you run it several times because all these are automatically cached.

Topic		Replies	Views
`load_from_cache_file` not working 🤗Datasets	1	2176	May 10, 2021
Caching a dataset with map() when loaded with from_dict() 🤗Datasets	3	2733	March 22, 2023
Caching tokenization 🤗Tokenizers	0	243	January 14, 2024
How to disable caching in load_dataset()? 🤗Datasets	6	6427	July 10, 2024
The datasets.map function does not load cached dataset Beginners	7	2282	November 21, 2023

Pipeline with custom dataset tokenizer: when to save/load manually

Related topics