Tokenize iterable dataset

surya-narayanan June 7, 2023, 12:21am 1

From what I understand, it’s better to set batch size = 1 for mapping a tokenize function on an iterable dataset, right? Or rather, to process with batched = False?

Topic		Replies	Views
Why use batched=True in map function? 🤗Datasets	2	7291	May 17, 2022
Streaming datasets and batched mapping 🤗Datasets	5	2664	January 10, 2022
Clarification on Batch mapping 🤗Datasets	2	913	November 2, 2023
I set up a different batch_size, but the time of data processing has not changed 🤗Tokenizers	0	537	September 1, 2021
Tokenizer dataset is very slow 🤗Tokenizers	3	4316	March 2, 2024

Tokenize iterable dataset

Related topics