Dataset set_format

dhruvgrammarly · November 24, 2024, 4:22am

This seems inconsistent with what the documentation at

to_numpy(*self*, *zero_copy_only=False* )

Return a NumPy copy of this array (experimental).

Parameters:

**zero_copy_only** [bool](), default `False`
Introduced for signature consistence with pyarrow.Array.to_numpy. This must be False here since NumPy arrays’ buffer must be contiguous.

This suggests that it’s making a copy of the data and not doing a zero copy to numpy arrays. I’m also running into a problem when loading it as numpy array or python list seems equally slow. Maybe I’m doing something horribly wrong. Create batch from list of ids in the dataset is very slow - #4

Topic		Replies	Views
How to change the datatype of a dataset after it has been converted to torch with huggingface images? Beginners	1	1332	September 1, 2023
Getting list of tensors instead of tensor array after using set_format 🤗Datasets	1	2151	November 30, 2021
Returns list of tensors instead of tensors with set_format in datasets Beginners	1	670	March 8, 2022
Local dataset loading performance: HF's arrow vs torch.load 🤗Datasets	5	1169	November 24, 2024
Iterable datasets for array data, limited formatting options 🤗Datasets	2	422	December 28, 2023

Dataset set_format

Related topics