is there a way to load this into the train
split and another dataframe in memory into the validation
split
None of the following options seem to do the trick:
dataset = Dataset.from_pandas(df)
dataset = Dataset.from_pandas(df, split='train')
dataset = Dataset.from_pandas(df, split=NamedSplit('train'))
dataset = Dataset.from_pandas(df, split=datasets.Split.TRAIN)
print(dataset)
The best I could come up that worked was (not sure if there is a easier/right way):
import pandas as pd
import datasets
from datasets import Dataset, DatasetDict
tdf = pd.DataFrame({"a": [1, 2, 3], "b": ['hello', 'ola', 'thammi']})
vdf = pd.DataFrame({"a": [4, 5, 6], "b": ['four', 'five', 'six']})
tds = Dataset.from_pandas(tdf)
vds = Dataset.from_pandas(vdf)
ds = DatasetDict()
ds['train'] = tds
ds['validation'] = vds
print(ds)