Hi ! It looks like a bug. Can you share the state.json file that is next to the .arrow file of your dataset on HDFS ? It can be useful for debugging.
Other than that, have you considered load the dataset in streaming mode ? It should work if you are able to mount your HDFS locally, since it doesn’t support the fs
parameter yet