Very low GPU usage when translating text, datasets not helping

Hi! Hugging Face blew my mind, it’s awesome, but I’m struggling to get better performance using my 1080ti: it is very low, at 3%, with CPU at around 30%.

At first I got the “UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset” warning so I switched to using a dataset. Now I get no warning but nothing seems to have changed. I guess I’m doing something wrong… here’s some sample code:

to_translate = [# Several Arabic Sentences]
dataset = Dataset.from_pandas(pd.DataFrame(to_translate ))

translator = pipeline(
    'translation_ar_to_en',
    model='Helsinki-NLP/opus-mt-ar-en',
    device=0
)

def trans(ds):
    ds['TRANSLATED'] = translator(ds['0'])
    return ds

for out in tqdm(translator(KeyDataset(dataset, "0"))):
    print(out)

I think I didn’t get the dataset part right yet…
Thanks!

1 Like

Hi Iván! I think you need to use the batch_size= argument when you call your pipeline() in order to process more text sequences at the same time. Something like this should work:

for out in tqdm(translator(KeyDataset(dataset, "0"), batch_size=32)):
    print(out)

You can experiment with different batch sizes to see which one gives you a better performance.
I hope this helps!

Thank you! We’re now implementing on a CPU-only server, but I’ll keep your reply in our backlog :slight_smile: