In the dataset I have 5000000 rows, I would like to add a column called āembeddingsā to my dataset.
dataset = dataset.add_column('embeddings', embeddings)
The variable embeddings is a numpy memmap array of size (5000000, 512).
But I get this error:
ArrowInvalidTraceback (most recent call last)
in
----> 1 dataset = dataset.add_column(āembeddingsā, embeddings)/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py in wrapper(*args, **kwargs)
486 }
487 # apply actual function
ā 488 out: Union[āDatasetā, āDatasetDictā] = func(self, *args, **kwargs)
489 datasets: List[āDatasetā] = list(out.values()) if isinstance(out, dict) else [out]
490 # re-apply format to the output/opt/conda/lib/python3.8/site-packages/datasets/fingerprint.py in wrapper(*args, **kwargs)
404 # Call actual function
405
ā 406 out = func(self, *args, **kwargs)
407
408 # Update fingerprint of in-place transforms + update in-place history of transforms/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py in add_column(self, name, column, new_fingerprint)
3346 :class:Dataset
3347 āā"
ā 3348 column_table = InMemoryTable.from_pydict({name: column})
3349 # Concatenate tables horizontally
3350 table = ConcatenationTable.from_tables([self._data, column_table], axis=1)/opt/conda/lib/python3.8/site-packages/datasets/table.py in from_pydict(cls, *args, **kwargs)
367 @classmethod
368 def from_pydict(cls, *args, **kwargs):
ā 369 return cls(pa.Table.from_pydict(*args, **kwargs))
370
371 @inject_arrow_table_documentation(pa.Table.from_batches)/opt/conda/lib/python3.8/site-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pydict()
/opt/conda/lib/python3.8/site-packages/pyarrow/table.pxi in pyarrow.lib._from_pydict()
/opt/conda/lib/python3.8/site-packages/pyarrow/array.pxi in pyarrow.lib.asarray()
/opt/conda/lib/python3.8/site-packages/pyarrow/array.pxi in pyarrow.lib.array()
/opt/conda/lib/python3.8/site-packages/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array()
/opt/conda/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: only handle 1-dimensional arrays
How can I solve?