I am running T5-base-grammar-correction for grammer correction on my dataframe with text column
from happytransformer import HappyTextToText from happytransformer import TTSettings from tqdm.notebook import tqdm tqdm.pandas() happy_tt = HappyTextToText("T5", "./t5-base-grammar-correction") beam_settings = TTSettings(num_beams=5, min_length=1, max_length=30) def grammer_pipeline(text): text = "gec: " + text result = happy_tt.generate_text(text, args=beam_settings) return result.text df['new_text'] = df['original_text'].progress_apply(grammer_pipeline)
Pandas apply function, though runs and provides required results, but runs quite slow .
Also I get the below warning while executing the code
/home/.local/lib/python3.6/site-packages/transformers/pipelines/base.py:908: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset UserWarning,
How to use Dataset to speed up things?