Problem with torch.multiprocessing and Roberta

I have a project in which I extract entities from multiple files, line by line. So I wrote a function that receives a file and both Roberta and it’s Tokenizer. The idea is to spawn multiple processes and run this function asynchronously for each file (actually at this point the files are already loaded on memory). I have 16GB of ram on my machine and I thought this would be sufficient to at least run 2 or 3 robertas in parallel, but the following codes hangs and fills 100% of my ram and does nothing. Does someone knows what am I doing wrong with the multiprocessed code? I’ve simplified my problem to this few lines of code that have the same problem.

    import torch
    from transformers import RobertaTokenizer, RobertaModel,RobertaForTokenClassification
    from tqdm import tqdm
    from torch.multiprocessing import Pool
    import torch.multiprocessing as mp

    model = RobertaForTokenClassification.from_pretrained('distilroberta-base')
    tokenizer = RobertaTokenizer.from_pretrained('distilroberta-base')

    model.share_memory() # is this necessary?

    ctx = mp.get_context('spawn')
    p = ctx.Pool(2)

    def f(model,tokenizer,sentence):

        inputs = tokenizer(sentence, return_tensors="pt")

        logits = model(**inputs)
        return 0

    sentences = [
        'yo this is a test',
        'yo this is not a test',
        'yo yo yo'

    jobs = []
    with torch.no_grad():
        for i in range(len(sentences)):
            job = p.apply_async(f, [model,tokenizer,sentences[i]])


        for job in tqdm(jobs):

This actually doesn’t work even with just one worker

p = ctx.Pool(1)

So I think it is related to the multiprocessing code

Might be related to the fast tokenizer, which is a multiprocessing-enabled Rust tokenizer. I’d suggest to have on tokenizer in a separate process and use a queue to request tokenization. You can then run different models in different processes.


  • First thing to speed everything up is using a single model and large batches, rather than doing line-per-line operations. That is incredibly slow.
  • If you have a GPU available, use it. It’ll be faster.
1 Like