Using num_proc>1 in Dataset.map hangs
|
|
8
|
4128
|
August 19, 2024
|
Streaming dataset into Trainer: does not implement __len__, max_steps has to be specified
|
|
6
|
4571
|
March 21, 2023
|
Cannot install Faiss in Google Collab
|
|
5
|
2696
|
June 10, 2025
|
Can dataset.map accept multiple arguments like python map
|
|
3
|
5743
|
April 20, 2023
|
How to slice an already loaded Dataset?
|
|
2
|
5886
|
December 16, 2022
|
Confusion in splitting dataset (from imagefolder) into train, test and validation
|
|
2
|
5772
|
August 12, 2022
|
Json dump format for load_dataset
|
|
5
|
22403
|
September 5, 2024
|
KeyError: '__index_level_0__' error with datasets arrow_writer.py
|
|
3
|
8625
|
August 29, 2024
|
NLP dataset for ByteLevelTokenizer Training
|
|
1
|
2111
|
February 16, 2021
|
How to create custom ClassLabels?
|
|
3
|
7484
|
January 20, 2022
|
Dataset with no splits
|
|
4
|
3556
|
May 16, 2024
|
Why is simply accessing dataset features so slow?
|
|
3
|
3809
|
November 22, 2021
|
UnboundLocalError
|
|
2
|
23958
|
February 6, 2023
|
Push to Hub - HTTPS Connection Pool( host = âhuggingface.coâ, port = 443 )
|
|
5
|
16910
|
June 30, 2022
|
Efficiently slicing dataset
|
|
2
|
2382
|
December 22, 2022
|
Setting dataset feature value as numpy array
|
|
7
|
8022
|
November 14, 2023
|
Remove columns from streamable datasets doesn't work
|
|
3
|
6273
|
January 24, 2024
|
Type object 'Dataset' has no attribute 'from_pandas'
|
|
3
|
5942
|
April 17, 2023
|
Best practices for a large dataset
|
|
7
|
2316
|
May 6, 2025
|
Help understanding how to build a dataset for language as with the old TextDataset
|
|
7
|
12769
|
October 6, 2021
|
ArrowInvalid: Column 1 named id expected length 512 but got length 1000
|
|
4
|
15569
|
June 6, 2024
|
How to process tabular data for fine tuning LLMs
|
|
0
|
1091
|
November 24, 2023
|
Try to read arrow files get: Invalid: Not an Arrow file
|
|
3
|
2980
|
May 31, 2024
|
Using load_dataset.set_transform() function along with Trainer class
|
|
4
|
2627
|
April 26, 2021
|
Pyarrow failed to parse string
|
|
5
|
7318
|
August 19, 2023
|
How to disable caching in load_dataset()?
|
|
6
|
6634
|
July 10, 2024
|
How to download subset of of a dataset scripted
|
|
6
|
6539
|
December 7, 2023
|
[Solved] Image dataset seems slow for larger image size
|
|
7
|
3423
|
December 16, 2021
|
Dataset access with `use_auth_token`
|
|
4
|
13566
|
June 10, 2023
|
Loading Custom Datasets
|
|
7
|
10723
|
May 25, 2021
|
How to use load_dataset to load a json file with all three splits?
|
|
2
|
9736
|
April 13, 2023
|
Imagenet-1k is not available in huggingface dataset hub
|
|
3
|
4558
|
October 26, 2022
|
NonMatchingSplitsSizesError
|
|
5
|
6462
|
September 13, 2023
|
Streaming batched data
|
|
4
|
3923
|
October 5, 2023
|
When calling load_metric ('rouge') what file is downloaded (and where do I find it)?
|
|
1
|
1892
|
April 22, 2022
|
How can I clean the dataset cache?
|
|
4
|
11725
|
March 1, 2024
|
Error loading Wikipedia Dataset
|
|
6
|
3043
|
July 5, 2023
|
'datasets.iterable_dataset.IterableDataset' to 'datasets.dataset_dict.DatasetDict'
|
|
3
|
2222
|
June 7, 2023
|
Error using datasets with pipeline for text generation
|
|
5
|
1018
|
December 30, 2024
|
Padding in datasets
|
|
6
|
5086
|
October 21, 2021
|
Flatten List of features
|
|
1
|
1666
|
April 7, 2022
|
Using Datasets, DataCollators and DataLoaders to create an NLP data pipeline
|
|
1
|
5192
|
June 21, 2023
|
Loading an imagenet-style image dataset with train/val directories
|
|
4
|
1802
|
August 12, 2022
|
How to customize the "User Access requests" message?
|
|
1
|
503
|
January 21, 2022
|
Create a dataset from generator
|
|
7
|
7938
|
January 30, 2024
|
Cannot load dataset on Kaggle
|
|
4
|
3150
|
August 16, 2023
|
How to sample dataset according to the index
|
|
2
|
12364
|
January 10, 2022
|
Error loading dataset
|
|
7
|
7274
|
May 13, 2022
|
Correct use of dataset.class_encode_column
|
|
1
|
2577
|
July 17, 2023
|
TypeError: Provided `function` which is applied to all elements of table returns a variable of type <class 'list'>
|
|
2
|
6481
|
February 28, 2024
|