Hey everyone,
So I’m working on a project which deals with textual data, and I’ve roughly 390k rows in that dataframe.
I tried mapping a function to use a transformer/pipeline to analyze the sentiments, but it’s taking quite a while, approximately 25 hours… Is there anyway I can deal with that?
Thanks in advance.
1 Like
Typically the approach from starting from a dataframe is:
df = ... your pd.DataFrame
tokenizer = ... your tokenizer
dataset = Dataset.from_pandas(df)
encoded_dataset = dataset.map(lambda examples: tokenizer(examples['sentence1']),
batched=True)
Note the “batched=True” argument, which should greatly speed things up.
1 Like