Read data from hdfs

hey @gfork, i’ve never tried it myself but the datasets library let’s you process data with Apache Beam: Beam Datasets — datasets 1.5.0 documentation

perhaps that is suitable for your use case?