HF Dataset: Array3D vs Image, which one is better and why

Hi team,

I just have a few questions around Array3D vs Image type:

  1. Say I have a column that stores a 3D numpy array in a HF dataset, is it always better to declare the column as a “Image” type as oppose to the “Array3D” type?
  2. In my experience, it seems that declaring a column to “Image” type is much faster than declaring it as the “Array3D” type when iterating over the dataset. I have never done any benchmark, so I’m wondering that is it true that “Image” column type is much faster than Array3D. If so, could you shed some light on why “Image” column is fast? or why Array3D is slower, what’s the overhead for Array3D?
  3. In what circumstance, do we want to declare an image as an “Array3D” column instead of a “Image” column?

More context
I have a computer vision training dataset, it has an image column which is declared as “Image” type. The image data is a string image file path.
Before training, during data preprocessing, I apply a few image preprocessing operations (e.g. resize), and use the preprocessed image for training. And for some technical reason, I have to use dataset.map() (instead of with_transform()) to apply preprocessings eagerly. Thus, I need to store those preprocessed images in the HF Dataset via map(). I can declare the column type for the preprocessed image via the features parameter in map(). I tried both Array3D and Image, the Image type is 2x faster than Array3D in every training epoch.

Thanks!