Hello everyone,
I am doing a tutorial on how to finetune pretrained Sentiment Analysis Classifier and all the finetuning part is based on a HuggingFace Dataset. Is there a way to transform a pandas Dataframe to a HuggingFace Dataset? Would help me alot with my data preprocessing…
1 Like
ehalit
August 18, 2021, 10:35am
2
You can have a look at here: link
4 Likes
Thanks for your help! Now it works
akomma
February 23, 2022, 6:57am
4
is there a way to load this into the train
split and another dataframe in memory into the validation
split
None of the following options seem to do the trick:
dataset = Dataset.from_pandas(df)
dataset = Dataset.from_pandas(df, split='train')
dataset = Dataset.from_pandas(df, split=NamedSplit('train'))
dataset = Dataset.from_pandas(df, split=datasets.Split.TRAIN)
print(dataset)
The best I could come up that worked was (not sure if there is a easier/right way):
import pandas as pd
import datasets
from datasets import Dataset, DatasetDict
tdf = pd.DataFrame({"a": [1, 2, 3], "b": ['hello', 'ola', 'thammi']})
vdf = pd.DataFrame({"a": [4, 5, 6], "b": ['four', 'five', 'six']})
tds = Dataset.from_pandas(tdf)
vds = Dataset.from_pandas(vdf)
ds = DatasetDict()
ds['train'] = tds
ds['validation'] = vds
print(ds)
21 Likes
Hi @akomma ! Yes, your second approach is the correct one.
1 Like
Page is not there. Could you please add a page link to that again or page name ?
ehalit
December 3, 2023, 9:01am
7
Look for the from_pandas method at link
Hi @mariosasko ! What did you mean by the second approach?
dataset = Dataset.from_pandas(df, split='train')
still doesn’t work in 2024. Do I still have to create it with something like this:
dataset = DatasetDict({"train": tds, "val": vds})
?
1 Like