From what I understand, it’s better to set batch size = 1 for mapping a tokenize function on an iterable dataset, right? Or rather, to process with batched = False?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Why use batched=True in map function? | 2 | 6746 | May 17, 2022 | |
Streaming datasets and batched mapping | 5 | 2605 | January 10, 2022 | |
Clarification on Batch mapping | 2 | 834 | November 2, 2023 | |
I set up a different batch_size, but the time of data processing has not changed | 0 | 534 | September 1, 2021 | |
Tokenizer dataset is very slow | 3 | 4034 | March 2, 2024 |