[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library
|
|
292
|
13898
|
October 30, 2022
|
How to split main dataset into train, dev, test as DatasetDict
|
|
21
|
43180
|
May 23, 2024
|
Image dataset best practices?
|
|
9
|
17562
|
January 15, 2023
|
How to combine local data files with an official 🤗 dataset
|
|
15
|
3589
|
April 7, 2021
|
Out of no where: requests.exceptions.ReadTimeout: HTTPSConnectionPool (host='huggingface.co', port=443): Read timed out
|
|
13
|
33976
|
July 29, 2024
|
Is there a size limit for dataset hosting
|
|
11
|
14325
|
August 24, 2023
|
Download only a subset of a split
|
|
10
|
17271
|
February 25, 2025
|
How to add a new column to a dataset
|
|
11
|
36205
|
October 3, 2023
|
IndexError: Invalid key: 16 is out of bounds for size 0
|
|
26
|
23481
|
June 5, 2024
|
How do I create a Image Segmentation Dataset
|
|
26
|
10443
|
April 11, 2024
|
Huggingface dataset install
|
|
13
|
2614
|
January 15, 2025
|
HF Dataset + TensorFlow + Ragged Tensors (Object Detection)
|
|
12
|
12451
|
November 1, 2024
|
Save Data from Streamlit Session to Persist Changes to HF Datasets
|
|
10
|
2336
|
October 5, 2022
|
Dataset label format for multi-label text classification
|
|
9
|
13388
|
February 9, 2023
|
Map multiprocessing Issue
|
|
31
|
17846
|
July 16, 2024
|
Dataset map function takes forever to run!
|
|
16
|
6931
|
August 15, 2024
|
How to load this simple audio data set and use dataset.map without memory issues?
|
|
12
|
4341
|
December 10, 2024
|
DatasetGenerationError: An error occurred while generating the dataset
|
|
9
|
24084
|
September 13, 2023
|
Support of very large dataset?
|
|
12
|
10439
|
August 24, 2022
|
Dataset set_format
|
|
11
|
10553
|
November 24, 2024
|
Minhash Deduplication
|
|
15
|
7515
|
August 6, 2022
|
Dataset repo requires arbitrary Python code execution
|
|
21
|
3038
|
February 14, 2025
|
Pipeline with custom dataset tokenizer: when to save/load manually
|
|
18
|
5647
|
September 18, 2020
|
My Space is stuck on “Starting on T4” all day
|
|
13
|
182
|
July 15, 2025
|
ArrowTypeError: Expected bytes, got a 'float' object, when trying to make a dataset from a list of dicts
|
|
10
|
11229
|
May 13, 2024
|
RuntimeError: Error in void faiss::gpu::allocMemorySpace
|
|
16
|
8568
|
October 12, 2020
|
I got Authorization error
|
|
12
|
9601
|
January 11, 2024
|
Allow streaming of large datasets with image/audio
|
|
18
|
3979
|
May 30, 2022
|
Git push rejected
|
|
20
|
6687
|
December 16, 2024
|
Save `DatasetDict` to HuggingFace Hub
|
|
12
|
7549
|
October 20, 2023
|
Can't use datasets offline, even if I have uploaded the datasets to .cache dir
|
|
10
|
8134
|
December 1, 2022
|
How to use S3 path with `load_dataset` with streaming=True?
|
|
11
|
7764
|
November 23, 2022
|
How to load large dataset with streaming mode and prepare for training?
|
|
10
|
4393
|
November 3, 2023
|
Understanding set_transform
|
|
10
|
7796
|
March 9, 2021
|
Multiprocessing map taking too much memory footprint
|
|
17
|
5916
|
April 5, 2024
|
Extremely slow data loading of imagefolder
|
|
9
|
2492
|
January 4, 2024
|
How do I add things (rows) to an already saved dataset?
|
|
9
|
6916
|
August 8, 2024
|
How to create a new large Dataset on disk?
|
|
10
|
3338
|
July 6, 2022
|
Limitations of iterable datasets
|
|
11
|
5652
|
June 28, 2024
|
Saving dataset in the current state without cache
|
|
9
|
5912
|
March 17, 2022
|
How to deal with unpickable objects in map
|
|
9
|
4564
|
October 23, 2020
|
Map method to tokenize raises index error
|
|
9
|
4289
|
June 9, 2021
|
Problem pushing dataset to huggingface
|
|
11
|
3647
|
June 26, 2023
|
Slow processing with map when using deepspeed or fairscale
|
|
10
|
3679
|
June 25, 2021
|
How do I make a dataset for vision models?
|
|
12
|
1689
|
April 20, 2024
|
NotImplementedError when solidifying a streaming dataset
|
|
11
|
2948
|
November 23, 2023
|
Best way to access the cached transformation arrow file
|
|
9
|
3147
|
January 19, 2024
|
Generating Croissant Metadata for Custom Image Dataset
|
|
12
|
482
|
April 15, 2025
|
Fetching rows of a large Dataset by index
|
|
10
|
1642
|
March 15, 2021
|
How to efficiently convert a large parallel corpus to a Huggingface dataset to train an EncoderDecoderModel?
|
|
10
|
2783
|
October 28, 2022
|