How to load two pandas dataframe into dataset object?

Hello, I am trying to load the train and test data frame into the dataset object. The usual way to load a pandas dataframe into dataset object is:

from datasets import Dataset
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3]})
dataset = Dataset.from_pandas(df)

My question is how to load train and test both pandas dataframe into the dataset?

for example if I have two dataframes:

from datasets import Dataset
import pandas as pd
df_train = pd.DataFrame({"a": [1, 2, 3]})
df_test = pd.DataFrame({"ab": [1, 2, 3]})

How to load these two frames?

Hi! You can concatenate them with:

from datasets import Dataset, concatenate_datasets
import pandas as pd
df_train = pd.DataFrame({"a": [1, 2, 3]})
df_test = pd.DataFrame({"ab": [1, 2, 3]}) 
ds_train = Dataset.from_pandas(df_train)
ds_test = Dataset.from_pandas(df_test)
ds = concatenate_datasets([ds_train, ds_test])
2 Likes

Hi I am looking for something like this:

DatasetDict({
    train: Dataset({
        features: ['a'],
        num_rows: 3
    })
    test: Dataset({
        features: ['ab'],
        num_rows: 3
    })
})

This should work.

from datasets import Dataset, DatasetDict
import pandas as pd

df_train = pd.DataFrame({'a' : [1, 2, 3]})
df_test = pd.DataFrame({'ab' : [1, 2, 3]})

ds_dict = {'train' : Dataset.from_pandas(df_train),
           'test' : Dataset.from_pandas(df_test)}

ds = DatasetDict(ds_dict)
1 Like