I understand typically I should save a list of data (e.g., list = [{"text": "...", "label": 1}, {"text": '...', "label": 1},...]) as a json file and use load_dataset("json", data_files="/path/data.json") to load as dataset. However in this use case, this list is generated on the fly, so I would like to load a list directly as a dataset. I used load_dataset(list) but got the following error:
TypeError: expected str, bytes or os.PathLike object, not list
Is there some way to allow me to directly load list as dataset?
Is it possible to create similar data from list of strings ?
Currently I do
dataset = Dataset.from_list(math_sentences) #math_sentences is array of string
but I got
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[16], line 3
1 from datasets import Dataset
----> 3 dataset = Dataset.from_list(math_sentences) #math_sentences is array of string
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\datasets\arrow_dataset.py:950, in Dataset.from_list(cls, mapping, features, info, split)
934 """
935 Convert a list of dicts to a `pyarrow.Table` to create a [`Dataset`]`.
936
(...)
947 [`Dataset`]
948 """
949 # for simplicity and consistency wrt OptimizedTypedSequence we do not use InMemoryTable.from_pylist here
--> 950 mapping = {k: [r.get(k) for r in mapping] for k in mapping[0]} if mapping else {}
951 return cls.from_dict(mapping, features, info, split)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\datasets\arrow_dataset.py:950, in <dictcomp>(.0)
934 """
935 Convert a list of dicts to a `pyarrow.Table` to create a [`Dataset`]`.
936
(...)
947 [`Dataset`]
948 """
949 # for simplicity and consistency wrt OptimizedTypedSequence we do not use InMemoryTable.from_pylist here
--> 950 mapping = {k: [r.get(k) for r in mapping] for k in mapping[0]} if mapping else {}
951 return cls.from_dict(mapping, features, info, split)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\datasets\arrow_dataset.py:950, in <listcomp>(.0)
934 """
935 Convert a list of dicts to a `pyarrow.Table` to create a [`Dataset`]`.
936
(...)
947 [`Dataset`]
948 """
949 # for simplicity and consistency wrt OptimizedTypedSequence we do not use InMemoryTable.from_pylist here
--> 950 mapping = {k: [r.get(k) for r in mapping] for k in mapping[0]} if mapping else {}
951 return cls.from_dict(mapping, features, info, split)
AttributeError: 'str' object has no attribute 'get'
@tempdeltavalue I had the same issue loading a list of strings. The fix was to convert it into a list of dicts, with each dict containing ‘text’ as a key and the actual string as the value.