Datasets + Arrow Help

Hi, I’m just getting started and am excited that Datasets is built on Arrow. But I haven’t seen how to access the Arrow data. For example, how do I use pyarrow or Polars on loaded training data?

You should be able to access the underlying Arrow data through a datasets _data. Note that such usage is not intended, though. EDIT: see @mariosasko’s reply. I was a bit too quick, you also have a public property data that you can use.

Hi! The underlying Arrow table can be accessed using the attribute, which can then be loaded in Polars as follows:

import polars as pl
from datasets import load_dataset
dset = load_dataset(...)
df = pl.from_arrow(
1 Like