Question about ROOTS corpus: availability & earlier web data
|
|
1
|
2
|
September 24, 2025
|
403 error on dataset fineweb-2
|
|
1
|
9
|
September 24, 2025
|
How to access path to audio recording in datasets 4.0?
|
|
1
|
22
|
September 17, 2025
|
Upload_large_folder for uploading to a PR
|
|
2
|
15
|
September 17, 2025
|
How to handle IterableDataset with HuggingFace trainer and num_workers in DDP setup
|
|
8
|
3418
|
September 16, 2025
|
[New Dataset Release] Bagpipes - Scottish Highland Bagpipes (Preview Pack, v0.9)
|
|
0
|
9
|
September 16, 2025
|
Add Low-Light Enhancement Datasets (e.g., LSRW-Nikon & LSRW-Huawei)
|
|
2
|
6
|
September 16, 2025
|
How to use split_dataset_by_node and shuffle on iterable dataset
|
|
5
|
722
|
September 15, 2025
|
Streaming for Saving
|
|
3
|
57
|
September 12, 2025
|
What’s the definiation of lazy loading? Is IterableDataset also faster than Dataset when loading locally?
|
|
5
|
20
|
September 12, 2025
|
Interest in a Real DeFi Trading Dataset with Microstructure Details?
|
|
4
|
38
|
September 11, 2025
|
Error in https://huggingface.co/learn/llm-course/chapter3/2?fw=pt#preprocessing-a-dataset
|
|
3
|
15
|
September 4, 2025
|
Create batch from list of ids in the dataset is very slow
|
|
5
|
873
|
September 4, 2025
|
How to get approved to get access on OASIS 3 dataset
|
|
0
|
8
|
September 3, 2025
|
Change metadata of parquet files
|
|
3
|
37
|
September 2, 2025
|
Feb 2025 CriteoPrivateAd dataset – when were the logs collected?
|
|
1
|
15
|
September 2, 2025
|
Missing dataset after PapersWithCode migration
|
|
3
|
66
|
August 29, 2025
|
Missing dataset card - Reddit-TIFU dataset
|
|
6
|
29
|
August 23, 2025
|
`save_to_disk` saving ALL data, even items I filtered out
|
|
2
|
24
|
August 21, 2025
|
DatasetInfo seems to be missing when I pull my dataset from HFHub
|
|
2
|
59
|
August 21, 2025
|
[New Dataset Release] Scottish Smallpipes in A (Preview Pack, v0.9)
|
|
0
|
13
|
August 20, 2025
|
How do you collect and structure data for an AI after-sales (SAV) agent in banking/insurance?
|
|
1
|
22
|
August 18, 2025
|
TikTok-10M Dataset
|
|
5
|
73
|
August 17, 2025
|
Error EBUG:filelock:Attempting to acquire lock
|
|
1
|
1084
|
August 15, 2025
|
Dataset flagged as unsafe due to false positive - how to resolve?
|
|
5
|
60
|
August 14, 2025
|
Vector Database
|
|
0
|
27
|
August 7, 2025
|
Error converting np float32
|
|
3
|
24
|
August 5, 2025
|
Looking for datasets with paragraph/scene-level genre labels (e.g., action, romance, dialogue)
|
|
0
|
20
|
August 5, 2025
|
Open Discord Chat Dataset (+ Model): Internet Tone Dataset for LLMs and ML
|
|
0
|
15
|
August 5, 2025
|
Error Uploading Large Folder
|
|
5
|
71
|
August 3, 2025
|