Hello, I am trying to load the train and test data frame into the dataset object. The usual way to load a pandas dataframe into dataset object is:
from datasets import Dataset
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3]})
dataset = Dataset.from_pandas(df)
My question is how to load train and test both pandas dataframe into the dataset?
for example if I have two dataframes:
from datasets import Dataset
import pandas as pd
df_train = pd.DataFrame({"a": [1, 2, 3]})
df_test = pd.DataFrame({"ab": [1, 2, 3]})
How to load these two frames?
Hi! You can concatenate them with:
from datasets import Dataset, concatenate_datasets
import pandas as pd
df_train = pd.DataFrame({"a": [1, 2, 3]})
df_test = pd.DataFrame({"ab": [1, 2, 3]})
ds_train = Dataset.from_pandas(df_train)
ds_test = Dataset.from_pandas(df_test)
ds = concatenate_datasets([ds_train, ds_test])
2 Likes
Hi I am looking for something like this:
DatasetDict({
train: Dataset({
features: ['a'],
num_rows: 3
})
test: Dataset({
features: ['ab'],
num_rows: 3
})
})
limsc
4
This should work.
from datasets import Dataset, DatasetDict
import pandas as pd
df_train = pd.DataFrame({'a' : [1, 2, 3]})
df_test = pd.DataFrame({'ab' : [1, 2, 3]})
ds_dict = {'train' : Dataset.from_pandas(df_train),
'test' : Dataset.from_pandas(df_test)}
ds = DatasetDict(ds_dict)
1 Like