How to handle IterableDataset with HuggingFace trainer and num_workers in DDP setup
|
|
3
|
103
|
March 29, 2024
|
Dataset Preview error with a dataset script and parquet files
|
|
3
|
435
|
March 29, 2024
|
Dataset features change based on download
|
|
0
|
22
|
March 28, 2024
|
A couple of questions about interleave_datasets()
|
|
7
|
302
|
March 28, 2024
|
Datasets with custom python objects
|
|
0
|
27
|
March 28, 2024
|
How can I download a specific split of a dataset?
|
|
0
|
24
|
March 27, 2024
|
.gz supported or not supported?
|
|
0
|
31
|
March 27, 2024
|
Adding items to Dataset is slow compared to loading from Python list
|
|
0
|
23
|
March 27, 2024
|
Force stratification in split
|
|
0
|
20
|
March 27, 2024
|
Best practice loading images files
|
|
3
|
910
|
March 27, 2024
|
How to split main dataset into train, dev, test as DatasetDict
|
|
19
|
30533
|
March 27, 2024
|
Creating a Sequence of ClassLabel for multi-label and multi-class problems
|
|
5
|
163
|
March 26, 2024
|
Dataset map() creates lot of cache files
|
|
6
|
3627
|
March 26, 2024
|
Odd dataset.map() behavior with PyTorch dataloader
|
|
2
|
42
|
March 25, 2024
|
Adding to dataset end with ArrowInvalid: cannot construct ChunkedArray from empty vector and omitted type"
|
|
0
|
37
|
March 24, 2024
|
Does the Dataset instance have a "batched reduce" style method?
|
|
1
|
56
|
March 22, 2024
|
Keeping IterableDataset node-wise split fixed during DDP
|
|
7
|
685
|
March 22, 2024
|
Download location
|
|
3
|
52
|
March 22, 2024
|
Finding datasets about IT skills
|
|
1
|
122
|
March 22, 2024
|
How can I download a sizable subset of a dataset
|
|
0
|
39
|
March 21, 2024
|
Specifying K-fold splits in a dataset
|
|
1
|
69
|
March 20, 2024
|
How to resolve file paths in a downloaded dataset?
|
|
4
|
104
|
March 20, 2024
|
Extremely slow Training split
|
|
1
|
53
|
March 20, 2024
|
Metadata CSV annotations for ImageFolder dataset
|
|
2
|
83
|
March 19, 2024
|
Enabling dataset viewer by coexistence of loading script and parquet files
|
|
5
|
103
|
March 18, 2024
|
When using Dataset.map to tokenize a dataset, the speed slows down as the progress approaches 100%
|
|
2
|
211
|
March 18, 2024
|
Loading images directly in data folder
|
|
1
|
70
|
March 18, 2024
|
Download_and_extract() file missing, but only for one split
|
|
1
|
64
|
March 18, 2024
|
Issue of multiprocessing in map function
|
|
2
|
88
|
March 18, 2024
|
`push_to_hub` a dataset dict with subsets and splits (e.g., GLUE)
|
|
6
|
1364
|
March 16, 2024
|