Copy columns in a dataset and compute statistics for a column
|
|
13
|
2015
|
July 10, 2024
|
Roadmap/timeline for dataset streaming
|
|
9
|
2280
|
July 5, 2021
|
Any workaround for push_to_hub() limits?
|
|
9
|
2244
|
May 2, 2024
|
Misunderstanding around creating audio datasets from Local files
|
|
12
|
1775
|
July 17, 2023
|
Datasets.load_datasets fails
|
|
12
|
873
|
October 11, 2024
|
Exceeded our hourly quotas for action while loading dataset to HF Hub
|
|
9
|
1462
|
November 7, 2023
|
Efficiently slicing imagefolder dataset split
|
|
9
|
1437
|
December 16, 2022
|
Sequence features - Class Label Cast_
|
|
9
|
1323
|
July 4, 2023
|
Convert_to_parquet fails for datasets with multiple configs
|
|
13
|
535
|
July 25, 2024
|
Strange problems with datasets-server
|
|
10
|
1055
|
July 25, 2024
|
Pickling issue using map
|
|
9
|
175
|
April 8, 2025
|
Handling decoding errors such as UnidentifiedImageError
|
|
10
|
874
|
February 5, 2025
|
Unable to Load Dataset Using `load_dataset`
|
|
10
|
412
|
March 11, 2025
|
Is there a way to delete/hide a published Dataset with assigned DOI?
|
|
11
|
268
|
October 4, 2024
|
Using datasets to open jsonl
|
|
10
|
83
|
July 2, 2025
|
How to split Hugging Face dataset to train and test?
|
|
5
|
55606
|
January 24, 2023
|
Can't import load_metric from datasets
|
|
4
|
16485
|
February 22, 2025
|
ValueError: Invalid pattern: '**' can only be an entire path component
|
|
6
|
7433
|
June 13, 2025
|
Class Labels for Custom Datasets
|
|
4
|
18110
|
June 2, 2022
|
AttributeError: 'DatasetDict' object has no attribute 'train_test_split'
|
|
4
|
20231
|
August 5, 2023
|
Converting string label to int
|
|
5
|
15320
|
August 8, 2023
|
Hugdatafast: hugginface/nlp + fastai
|
|
1
|
1512
|
September 8, 2020
|
Nlp 0.3.0 is out!
|
|
3
|
846
|
July 8, 2020
|
Load Dataset from arrow file
|
|
1
|
11673
|
October 27, 2022
|
Counting the number of training tokens in a scalable way
|
|
0
|
2674
|
June 10, 2022
|
How can I grab the first N rows of a Dataset *as* a Dataset object?
|
|
3
|
22892
|
October 4, 2024
|
Model inference on tokenized dataset
|
|
2
|
6355
|
March 22, 2023
|
NotImplementedError when loading dataset with Streamlit
|
|
8
|
10380
|
June 16, 2025
|
ModuleNotFoundError: No module named 'datasets'
|
|
4
|
37225
|
December 29, 2023
|
Convert a list of dictionaries to hugging face dataset object
|
|
4
|
19811
|
December 7, 2023
|
Save and load datasets
|
|
2
|
40116
|
August 16, 2021
|
Error while downloading a repo from Hugging Face : Read timed out
|
|
2
|
11075
|
June 28, 2023
|
Log multiple metrics while training
|
|
5
|
11051
|
March 15, 2022
|
Split DataFrame into validation and train split
|
|
2
|
6555
|
April 11, 2022
|
How to dealing with Data Imbalance
|
|
2
|
6377
|
July 28, 2020
|
Nlp Datasets: speed-test vs Fastai
|
|
6
|
1127
|
August 24, 2020
|
How to use Huggingface Trainer streaming Datasets without wrapping it with torchdata's IterableWrapper?
|
|
1
|
4622
|
October 30, 2022
|
Visual exploration of the imagenet-1k dataset
|
|
2
|
3081
|
February 12, 2025
|
ValueError: Unable to avoid copy while creating an array as requested
|
|
5
|
5948
|
December 27, 2024
|
HugginFace dataset error: RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
|
|
3
|
11747
|
May 6, 2022
|
Why use batched=True in map function?
|
|
2
|
7422
|
May 17, 2022
|
How to use `map` or similar when one row is mapped to multiple rows?
|
|
1
|
2836
|
July 20, 2021
|
[Help wanted] Common Crawl needs help to be richer & more multilingual
|
|
1
|
87
|
January 27, 2025
|
HuggingFace dataset: each element in list of batch should be of equal size
|
|
3
|
10439
|
August 10, 2023
|
Remove a row/specific index from the dataset
|
|
6
|
13543
|
February 8, 2025
|
Handling large image datasets
|
|
7
|
1945
|
June 30, 2022
|
Datasets 'ChunkedEncodingError: ConnectionBroken'
|
|
2
|
5459
|
March 20, 2025
|
Compatibility for numpy arrays
|
|
7
|
5582
|
April 27, 2021
|
Specifying download directory for custom dataset loading script
|
|
6
|
17724
|
May 2, 2023
|
`push_to_hub` a dataset dict with subsets and splits (e.g., GLUE)
|
|
6
|
2721
|
March 16, 2024
|