Is there some pointer to using Dataset.from_generator()
function?
when the .from_generator()
, e.g.
import pandas as pd
from datasets import Dataset
from datasets.filesystems import S3FileSystem
s3 = S3FileSystem()
with s3.open("s3://mydata/data.tsv") as fin:
df = pd.read_csv(fin, sep='\t', chunksize=50) # df is iterable.
ds = Dataset.from_generator(df)
it was throwing an error:
AttributeError: 'S3File' object has no attribute 'name'