Standard getitem returns wrong data type for arrays

Hi,
I have created a huggingface dataset that contains some columns that contain arrays. The dtype of these arrays are casted as int32. But when I get the value from the dataset it gives me an array with int64. Here is a minimal example of the problem:

from datasets import Dataset
from datasets.features import Sequence, Value, Features
import pandas as pd 
pd = pd.DataFrame([[[1,2,3],[3,4,5]],[[10,20,30],[30,40,50]]], columns=["A", "B"])
d = Dataset.from_pandas(pd, features=Features({
    "A": Sequence(Value(dtype="int32")),
    "B": Sequence(Value(dtype="int32")),
}))
d.set_format("numpy")

The output of d.features is as expected

{'A': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'B': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None)}

But when I get the dtype of one of the values with d["A"][0].dtype it gives dtype('int64').

The same thing also happens for arrays with float dtype. These are always returned as float32, no matter the specified dtype in the feature.

I tried to find in the code why this is happening and the problem seems to be that the default dtype specified here is not overwritten with the dtype specified in the feature. I can call d._getitem(0, format_kwargs={"dtype": np.int32}), which returns the array with the correct dtype, but I can of course not specify the format_kwargs in the normal data access (e.g. d["A"][0]). Also I think the correct dtype should be constructed from the features without the need to manually specifying it all the time.

Is this behaviour intentional? If yes, why?

Thanks a lot in advance

You can get int32 values with d.set_format("numpy", dtype=np.int32). More info on this issue is available in `with_format("numpy")` silently downcasts float64 to float32 features · Issue #5517 · huggingface/datasets · GitHub (for the float case). We plan to drop this behavior in Datasets 3.0.

Thanks for the fast reply.

The solution with d.set_format("numpy", dtype=np.int32) only works if all columns are int32, correct? Because when I also have columns with float arrays they are also returned as int32 in this case.

I can for now fix this issue for us, by manually casting the return of these columns to the correct dtype before using it, but that is of course not ideal. So I believe it would be wise to discontinue this behaviour in a future release, as you suggested.