AutoModel never runs with multiprocessing

OlivierCR · July 19, 2021, 4:06pm

Hi,

I’m trying to speed up prediction with an AutoModelForSequenceClassification model.

Here is the code I use

tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")


def infer(sentence):
    payload = str(sentence.lower())
    seq = tokenizer(payload, return_tensors="pt")
    pred = np.argmax(np.array(model(**seq).logits.tolist()[0]))
    return pred

the infer function works fine when run alone, or in a Python map(infer, list_of_sentences)

However, this doesn’t run (hang forever), even with N = 1

import multiprocessing as mp

with mp.Pool(N) as pool:
    preds = list(pool.map(infer, sample))

What prevents Hugging Face Transformers models from being run in parallel from a multiprocessing pool.map function? (transformers 4.6.0)

Topic		Replies	Views
Instantiating models will not terminate Community Calls	0	777	April 6, 2023
How to use Auto Model For SequenceClassification for Multi-Class Text Classification? 🤗AutoTrain	1	3728	February 26, 2023
Distributed inference on multiple files 🤗Transformers	1	1000	January 22, 2023
Tokenizer train_new_from_iterator hanging for several models 🤗Transformers	0	150	March 16, 2024
AutoTokenizer.encode with multiThread and mutliProcess 🤗Tokenizers	2	259	October 9, 2024

AutoModel never runs with multiprocessing

Related topics