Iterable of batches from IterableDataset

simonberrebi · September 2, 2024, 11:59pm

According to the documentation, I should be able to use the batch method on an iterabledataset. However, the following code gives an AttributeError: ‘IterableDataset’ object has no attribute ‘batch’.

dataset = load_dataset("parquet", data_files="part-*-b4b8fd5e-a0a7-45e2-9b70-7c526ae44202-c000.zstd.parquet", streaming=True)

dataset['train'].batch(batch_size=32)

windmaple · September 24, 2024, 4:27am

I had the same issue. Upgrading ‘datasets’ worked for me.

Topic		Replies	Views
How do i batch in streaming of data set Intermediate	1	42	May 3, 2025
Issue with iterable dataset that is stuck on StopIteration 🤗Datasets	4	216	August 19, 2024
Roadmap/timeline for dataset streaming 🤗Datasets	9	2271	July 5, 2021
How do I iterate through <class 'datasets.dataset_dict.IterableDatasetDict'>? Beginners	2	2896	January 15, 2024
One-to-many batch mapping with IterableDatasets and batch_size=1 doesn't work 🤗Datasets	2	22	April 14, 2025

Iterable of batches from IterableDataset

Related topics