Slow Iteration speed (with and without keep_in_memory=True)
|
|
3
|
1425
|
March 14, 2023
|
Multiprocessing and sharding when creating dataset from scratch using loading script
|
|
2
|
1641
|
November 4, 2022
|
Not able to upload or download custom datasets
|
|
3
|
252
|
October 6, 2024
|
Not declaring splits inside of dataset loading script
|
|
2
|
1635
|
July 28, 2022
|
Load a COCO format database from disk for DETR
|
|
4
|
225
|
May 14, 2025
|
HF Dataset to COCO format dataset
|
|
5
|
1146
|
December 31, 2023
|
The datasets.map() method doesn't keep tensor format from `tokenizer`
|
|
1
|
1935
|
November 4, 2022
|
Dataset revision number
|
|
8
|
913
|
May 6, 2024
|
Download_custom method of StreamingDownloadManager not implemented
|
|
8
|
910
|
August 21, 2023
|
Load the Flores Data set
|
|
4
|
1220
|
May 25, 2023
|
Problems about loading and managing extreme large datasets (like SA1B)
|
|
2
|
885
|
July 13, 2023
|
TypeError when applying map after set_format(type='torch')
|
|
3
|
1354
|
September 13, 2022
|
Datasets - Streaming Output to Arrow?
|
|
3
|
240
|
October 23, 2024
|
Trying to figure out when is a dataset stored in memory?
|
|
4
|
1206
|
June 29, 2023
|
Why can't I upload a parquet file to my dataset? (Error: o?.throwIfAborted is not a function.)
|
|
6
|
1010
|
November 20, 2023
|
Audio files view error
|
|
7
|
941
|
March 27, 2023
|
Most efficient way to retrieve N rows for a subset of columns
|
|
2
|
1535
|
November 3, 2021
|
ValueError: audio at <filename> doesn't have metadata in <path>/metadata.csv
|
|
6
|
1000
|
October 30, 2023
|
DPR Context tokenization in a GPU
|
|
4
|
1182
|
September 25, 2020
|
Load_dataset without saving cache files
|
|
1
|
1863
|
April 19, 2023
|
Load dataset from a specific cache file
|
|
3
|
1311
|
February 26, 2024
|
Processing Large Dataset for Training GPT2 model
|
|
4
|
1168
|
April 12, 2023
|
Process "Downloading and preparing dataset json/default" doesnt proceed
|
|
3
|
1298
|
November 2, 2022
|
Would it be possible to implement and Iterable dataset with streaming and fast resume (no need to skip batches)
|
|
3
|
1296
|
October 7, 2024
|
Unable to resolve any data file after loading once
|
|
1
|
1832
|
December 21, 2021
|
Does huggingface support load raw text dataset from hdfs?
|
|
3
|
1291
|
January 9, 2022
|
Guidance Needed on Choosing the Right Dataset Format for Transformer Model Training
|
|
1
|
1823
|
December 8, 2023
|
How to clean/audit your image data?
|
|
1
|
1022
|
April 21, 2023
|
Common Voice 8.0.0 en using all available RAM
|
|
7
|
908
|
August 5, 2022
|
Use load dataset to load a sample of the dataset
|
|
3
|
1277
|
May 24, 2021
|
Exceeded maximum rows when load_dataset for JSON
|
|
4
|
1139
|
April 6, 2023
|
Representing nested dictionary with different keys
|
|
5
|
1030
|
April 7, 2022
|
Download only 1 of many parquet file
|
|
2
|
259
|
March 19, 2025
|
Increased arrow table size by factor of ~2
|
|
5
|
1028
|
November 28, 2022
|
How to save a mapped dataset
|
|
4
|
1115
|
June 8, 2023
|
Dataset.map saves list as numpy array instead of as list
|
|
2
|
1438
|
January 3, 2023
|
Skip rows with datasets.Dataset.map()
|
|
1
|
1749
|
January 3, 2023
|
Batch response: Too many password attempts while uploading the dataset files with lfs
|
|
8
|
824
|
August 29, 2023
|
Multithreading with map
|
|
1
|
978
|
January 23, 2023
|
Making an infinite IterableDataset
|
|
6
|
165
|
March 19, 2025
|
Image segmentation of a kaggle dataset
|
|
2
|
1412
|
April 12, 2023
|
Imagenet in datasets?
|
|
2
|
1412
|
November 9, 2021
|
IndexError: list index out of range for loading_dataset
|
|
1
|
1728
|
February 15, 2022
|
ValueError: Field 'ner_tags' from the JSON data of type list<item: string> is not compatible with ClassLabel. Compatible types are int64 and string
|
|
7
|
863
|
March 25, 2022
|
Load dataset from cache in offline mode
|
|
1
|
1718
|
January 23, 2023
|
Does `Dataset.map(..., batched=True, batch_size=N)` save the original order?
|
|
2
|
1383
|
June 28, 2024
|
Ideal batch_size and writer_batch_size for datasets
|
|
1
|
1693
|
December 9, 2022
|
Ds.map(): optimizing PIL Image processing as tensorflow tensor
|
|
2
|
1373
|
April 27, 2024
|
Can't run trainer.predict(). ValueError: 'process_id' should be a number greater than 0
|
|
2
|
1368
|
May 28, 2022
|
Setting format of columns for nested dictionary datasets with set_format
|
|
1
|
940
|
May 10, 2021
|