Common Voice: Load validated split Hugging Face data?
|
|
4
|
1534
|
June 5, 2023
|
Unable to properly map tensors to examples
|
|
6
|
1292
|
December 15, 2022
|
How to sample batches from multiple datasets?
|
|
2
|
1965
|
January 18, 2024
|
Unable to load CommonVoice latest version
|
|
3
|
1696
|
December 13, 2021
|
How to use large image-text datasets in hugging face hub without downloading for free
|
|
6
|
1275
|
November 12, 2023
|
`datasets.map` calls a function that requires a `transformers.PreTrainedModel` object - unpickable object
|
|
2
|
1944
|
December 2, 2022
|
Efficient way to concatenate DatasetDict objects
|
|
1
|
2372
|
June 12, 2023
|
UnicodeDecodeError when loading Mulit Lingual text file
|
|
1
|
2372
|
April 7, 2022
|
UTF-16 for datasets?
|
|
4
|
1499
|
June 21, 2023
|
HF Datasets not working with Language Modeling notebook
|
|
2
|
1923
|
May 2, 2021
|
Desired behavior when calling `shuffle` or `select` on `interleave_datasets`
|
|
1
|
414
|
July 20, 2021
|
How do I add custom metadata fields to datasets?
|
|
1
|
1306
|
June 21, 2023
|
Load a subset of a dataset
|
|
2
|
1888
|
April 19, 2023
|
Best practice loading images files
|
|
3
|
1633
|
March 27, 2024
|
How to create a custom dataset by loading text data from elasticsearch database on a remote server?
|
|
5
|
1331
|
May 31, 2024
|
Why load_dataset on Audiofolder with metadata is returning Filenotfound error
|
|
6
|
1226
|
August 18, 2023
|
Generate dataset with empty features
|
|
2
|
1872
|
May 17, 2023
|
Interleaving Iterable Dataset with num_workers > 0
|
|
3
|
1618
|
April 11, 2023
|
Passing schema features to a load_dataset function
|
|
4
|
1445
|
August 26, 2021
|
Distributed data sampling for streaming
|
|
2
|
1865
|
October 4, 2023
|
Cannot install datasets library in conda
|
|
1
|
1281
|
June 23, 2024
|
Incrementally adding processed examples to a dataset
|
|
4
|
1438
|
June 23, 2022
|
Dataset map preprocess throws ArrowInvalid
|
|
5
|
1308
|
September 16, 2021
|
Create a dataset for translation
|
|
4
|
1429
|
December 14, 2023
|
[Semantic search with FAISS] Can't manage to format embeddings column to numpy format
|
|
1
|
714
|
December 8, 2021
|
Huggingface Vision Dataset - the right way to use it?
|
|
5
|
1287
|
July 11, 2022
|
How to get the number of samples in a dataset without downloading the whole dataset?
|
|
3
|
1576
|
September 4, 2023
|
Common voice dataset 15.0 version release
|
|
1
|
1253
|
October 3, 2023
|
How to filter samples on-the-fly?
|
|
1
|
2228
|
February 15, 2022
|
Wav2vec2 pretraining on own wav files
|
|
2
|
1019
|
April 24, 2022
|
Parquet image dataset
|
|
6
|
1186
|
July 10, 2024
|
Can't use ImageFolder
|
|
3
|
1567
|
June 25, 2022
|
How do you save an IterableDataset to disk?
|
|
3
|
878
|
November 18, 2024
|
"Too many open files" when loading Common Voice
|
|
4
|
1386
|
February 8, 2022
|
`load_from_cache_file` not working
|
|
1
|
2189
|
May 10, 2021
|
How can I run it on Linux with GLIBC 2.27
|
|
2
|
1783
|
October 31, 2023
|
IndexError using save_to_disk
|
|
3
|
1538
|
February 1, 2024
|
Getting list of tensors instead of tensor array after using set_format
|
|
1
|
2165
|
November 30, 2021
|
Apply same transform to pixel_values and labels for semantic segmentation
|
|
1
|
2158
|
March 31, 2022
|
Getting pyarrow.lib.ArrowInvalid: Column 2 named start_positions expected length 1000 but got length 1
|
|
1
|
2130
|
July 27, 2023
|
Japanese keyword audio dataset
|
|
3
|
267
|
April 1, 2025
|
Local dataset loading performance: HF's arrow vs torch.load
|
|
5
|
1218
|
November 24, 2024
|
Item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()} AttributeError: 'list' object has no attribute 'items'
|
|
1
|
2092
|
December 10, 2021
|
Create multiple dataset configs with `push_to_hub()` method?
|
|
1
|
657
|
November 3, 2022
|
Why dataset iterating is so slow?
|
|
1
|
2066
|
January 3, 2023
|
Data files not working with custom loading script and dataset
|
|
3
|
1445
|
May 2, 2023
|
Can I upload a dataset of old VHS recordings of music videos?
|
|
5
|
663
|
July 19, 2023
|
ArrowBasedBuilder versus GeneratorDBasedBuilder
|
|
4
|
408
|
February 8, 2025
|
Dataset slow during model training
|
|
1
|
2029
|
June 13, 2022
|
Can't iterate a DataLoader
|
|
3
|
1429
|
February 25, 2022
|