HF Dataset: Array3D vs Image, which one is better and why

HumpyDonkey · April 3, 2023, 5:53am

Hi team,

I just have a few questions around Array3D vs Image type:

Say I have a column that stores a 3D numpy array in a HF dataset, is it always better to declare the column as a “Image” type as oppose to the “Array3D” type?
In my experience, it seems that declaring a column to “Image” type is much faster than declaring it as the “Array3D” type when iterating over the dataset. I have never done any benchmark, so I’m wondering that is it true that “Image” column type is much faster than Array3D. If so, could you shed some light on why “Image” column is fast? or why Array3D is slower, what’s the overhead for Array3D?
In what circumstance, do we want to declare an image as an “Array3D” column instead of a “Image” column?

More context
I have a computer vision training dataset, it has an image column which is declared as “Image” type. The image data is a string image file path.
Before training, during data preprocessing, I apply a few image preprocessing operations (e.g. resize), and use the preprocessed image for training. And for some technical reason, I have to use dataset.map() (instead of with_transform()) to apply preprocessings eagerly. Thus, I need to store those preprocessed images in the HF Dataset via map(). I can declare the column type for the preprocessed image via the features parameter in map(). I tried both Array3D and Image, the Image type is 2x faster than Array3D in every training epoch.

Thanks!

Topic		Replies	Views
Image&Array2d/3d Performance Issue 🤗Datasets	0	270	November 16, 2023
Image dataset performance when using map 🤗Datasets	0	120	June 24, 2024
Significant performance difference between two shapes using Array2D features 🤗Datasets	3	346	September 5, 2023
[Solved] Image dataset seems slow for larger image size 🤗Datasets	7	3407	December 16, 2021
What's the best way to change (convert) column type in Dataset 🤗Datasets	2	6955	October 21, 2021

HF Dataset: Array3D vs Image, which one is better and why

Related topics