Sentiment analysis with large Pandas dataframe

Hey everyone,

So I’m working on a project which deals with textual data, and I’ve roughly 390k rows in that dataframe.

I tried mapping a function to use a transformer/pipeline to analyze the sentiments, but it’s taking quite a while, approximately 25 hours… Is there anyway I can deal with that?

Thanks in advance.

1 Like

can you share your code?

Typically the approach from starting from a dataframe is:

df = ... your pd.DataFrame
tokenizer = ... your tokenizer
dataset = Dataset.from_pandas(df)
encoded_dataset = dataset.map(lambda examples: tokenizer(examples['sentence1']),
                              batched=True)

Note the “batched=True” argument, which should greatly speed things up.

1 Like