Dataset.map hangs on tokenization (relatively small dataset)
|
|
2
|
2003
|
April 22, 2022
|
How to load a large hf dataset efficiently?
|
|
5
|
2511
|
January 22, 2024
|
Dataset format standards for chat-based, fine-tuned Llama models
|
|
2
|
6316
|
February 16, 2024
|
Map fails for more than 4 processes
|
|
7
|
3826
|
April 9, 2025
|
The Dataset Preview has been disabled on this dataset
|
|
8
|
3596
|
November 2, 2023
|
Loading Huge Image Dataset seems to take a lot of time
|
|
7
|
3765
|
May 16, 2022
|
Loading a dataset cached in a LocalFileSystem is not supported
|
|
3
|
940
|
July 23, 2025
|
Load_dataset(): how to skip Starting new HTTPS connection (1): storage.googleapis.com:443
|
|
6
|
3953
|
April 3, 2023
|
Can't load exist dataset for evaluation
|
|
4
|
830
|
May 15, 2025
|
No space left on device
|
|
3
|
5165
|
January 25, 2021
|
Strange Error While Attempting to Load DataSet
|
|
7
|
3649
|
March 28, 2025
|
How to handle IterableDataset with HuggingFace trainer and num_workers in DDP setup
|
|
8
|
3433
|
September 16, 2025
|
EmptyDatasetError: The directory at xsum doesn't contain any data files
|
|
1
|
2258
|
December 11, 2022
|
pyarrow.lib.FloatArray: did not recognize Python value type when inferring an Arrow data type
|
|
3
|
4999
|
March 17, 2023
|
Huggingface-cli to load_dataset
|
|
5
|
4054
|
July 23, 2025
|
Max individual file size for LFS files is 46.6GB
|
|
2
|
3200
|
May 19, 2022
|
Getting PermissionError: [WinError 32] When Using Load_Dataset()
|
|
4
|
4399
|
January 19, 2021
|
Caching a dataset with map() when loaded with from_dict()
|
|
3
|
2740
|
March 22, 2023
|
Loading natural_questions
|
|
6
|
3576
|
December 12, 2023
|
Load Dataset and Save as Parquet
|
|
3
|
4654
|
January 7, 2025
|
Map() function freezes on large dataset
|
|
8
|
3086
|
September 10, 2023
|
Issue concatenating datasets
|
|
3
|
4624
|
January 3, 2023
|
Performance tips for shuffle and flatten_indices
|
|
5
|
2115
|
December 11, 2024
|
Deleting Duplicate Saved Datasets
|
|
3
|
4590
|
September 7, 2022
|
Multi-class Using Dataset
|
|
5
|
3716
|
February 10, 2023
|
Huggingface not accessible to china
|
|
2
|
2944
|
February 8, 2024
|
How to get size of a dataset?
|
|
2
|
5184
|
January 29, 2024
|
Dataset loading is not working
|
|
2
|
5168
|
September 13, 2022
|
Streaming dataset and cache
|
|
5
|
3653
|
August 4, 2023
|
Batching vs. Sharding a Large Dataset
|
|
4
|
2249
|
June 8, 2021
|
From where can I import the get_coco_api_from_dataset module?
|
|
5
|
3631
|
August 8, 2022
|
Protein/molecule datasets
|
|
3
|
444
|
October 18, 2022
|
HTTP 504: Gateway timeout error when pushing dataset
|
|
8
|
2914
|
March 3, 2025
|
Error EBUG:filelock:Attempting to acquire lock
|
|
1
|
1088
|
August 15, 2025
|
How exactly does datasets versioning work?
|
|
5
|
3482
|
July 27, 2022
|
How to duplicate a dataset?
|
|
1
|
5979
|
July 21, 2021
|
[urgent]Can you reconstruct datasets using the cache file (.arrow file)?
|
|
5
|
1079
|
August 27, 2021
|
How to handle streaming datasets with DDP?
|
|
1
|
590
|
January 28, 2024
|
Fastest way to do inference on a large dataset in huggingface?
|
|
5
|
3378
|
May 3, 2024
|
Compressing, saving, and loading datasets
|
|
3
|
2309
|
November 10, 2020
|
Parquet compression for image dataset
|
|
5
|
3283
|
December 7, 2023
|
Use SQL database as dataset?
|
|
3
|
2239
|
June 28, 2023
|
Finding number of tokens in dataset
|
|
2
|
4553
|
November 19, 2021
|
How can I multithreadedly download a HuggingFace dataset?
|
|
2
|
1436
|
December 19, 2023
|
KeyError: 'Field "builder_name" does not exist in table schema'
|
|
5
|
1783
|
January 20, 2022
|
Using "load_metric" offline in datasets
|
|
2
|
4424
|
August 24, 2021
|
How can I convert a loaded dataset in to a parquet file and save it to the S3
|
|
2
|
4418
|
July 31, 2023
|
Datasets map tokenization throws OSError: No space left on device
|
|
8
|
2543
|
March 19, 2025
|
datasets.Dataset.map() idle processes when multiprocessing
|
|
6
|
2876
|
December 22, 2022
|
ConnectionError: Couldn't reach https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0/resolve/main/common_voice_13_0.py
|
|
1
|
534
|
July 12, 2024
|