Hugging Face Forums
Processing Large Dataset for Training GPT2 model
🤗Datasets
lhoestq
April 6, 2023, 12:06pm
2
Hi ! Have you tried increasing
preprocessing_num_workers
?
show post in topic
Related topics
Topic
Replies
Views
Activity
.map() function extremely slow
🤗Datasets
1
1373
September 13, 2023
Tokenizer dataset is very slow
🤗Tokenizers
3
4418
March 2, 2024
Tokenizer performance is slow, after call to dataset map
🤗Datasets
0
176
June 15, 2024
Generating Vocabulary using Datasets
🤗Datasets
1
1462
August 30, 2022
Smarter way to load C4 dataset
🤗Datasets
1
813
November 6, 2023