Hello
I am trying learn how to do embeddings etc. And in an earlier run, had saved some embeddings into arrow files using dataset.save_to_disk( )
which generated two files data-00000-of-00002.arrow and data-00001-of-00002.arrow
Today, I want to load these two files into one dataset.
what Ive’ tried: Dataset.from_file("/content/drive/MyDrive/data-00000-of-00002.arrow", "/content/drive/MyDrive/data-00001-of-00002.arrow")
, that is, passing the two files to from_file
but this gives me an error AttributeError: 'str' object has no attribute 'copy'
So how do i load these two into one dataset?
expected outcome: the orignial dataset that I had created and saved to disk is re-created from these two files.
thank you
Update: Unsure if this is a workaround or something but I used from datasets import load_dataset
and then data = load_dataset("arrow", data_files={file1, file2, file3})
instead of Dataset.from_file
and this works. I am kinda glad my question went into the Akismet queue, because it made me want to try alternatives If someone could tell me how to use the Dataset.from_file
function do the same, I would love that.
Cheers