Try to read arrow files get: Invalid: Not an Arrow file

I’m trying to read the huggingface arrow files from libarrow in c++ and python. And I get:
Invalid: Not an Arrow file.

Python Code:
import pyarrow as pa
with open(‘glue-test.arrow’, ‘rb’) as f:
data = pa.ipc.open_file(f)

C++ Code:
std::shared_ptrarrow::io::ReadableFile infile;
ARROW_ASSIGN_OR_RAISE(infile, arrow::io::ReadableFile::Open(“data-00000-of-00001.arrow”, arrow::default_memory_pool()));
ARROW_ASSIGN_OR_RAISE(auto ipc_reader, arrow::ipc::RecordBatchFileReader::Open(infile));

And they both result:
Invalid: Not an Arrow file

I create the arrow files using:
from datasets import load_dataset
snil = load_dataset(‘snli’, split=‘train’)
snil.save_to_disk(“tempdata”)

Any Ideas will be appreciated … I’m really stuck at this thing.

1 Like

We’re currently using the Arrow stream format - not the IPC file format.

You can try using the RecordBatchStreamReader instead

4 Likes

Thank you! That seem to work.

1 Like

Thanks for your advice!