Problem with torch.multiprocessing and Roberta

thiagolira · March 13, 2021, 4:37pm

I have a project in which I extract entities from multiple files, line by line. So I wrote a function that receives a file and both Roberta and it’s Tokenizer. The idea is to spawn multiple processes and run this function asynchronously for each file (actually at this point the files are already loaded on memory). I have 16GB of ram on my machine and I thought this would be sufficient to at least run 2 or 3 robertas in parallel, but the following codes hangs and fills 100% of my ram and does nothing. Does someone knows what am I doing wrong with the multiprocessed code? I’ve simplified my problem to this few lines of code that have the same problem.

    import torch
    from transformers import RobertaTokenizer, RobertaModel,RobertaForTokenClassification
    from tqdm import tqdm
    from torch.multiprocessing import Pool
    import torch.multiprocessing as mp

    model = RobertaForTokenClassification.from_pretrained('distilroberta-base')
    tokenizer = RobertaTokenizer.from_pretrained('distilroberta-base')

    model.share_memory() # is this necessary?
    model.eval()

    ctx = mp.get_context('spawn')
    p = ctx.Pool(2)

   
    def f(model,tokenizer,sentence):

        inputs = tokenizer(sentence, return_tensors="pt")

        logits = model(**inputs)
        
        return 0


    sentences = [
        'yo this is a test',
        'yo this is not a test',
        'yo yo yo'
    ]

    jobs = []
    with torch.no_grad():
        for i in range(len(sentences)):
            job = p.apply_async(f, [model,tokenizer,sentences[i]])
            jobs.append(job)

        results=[]

        for job in tqdm(jobs):
            pass
            results.append(job.get())

Topic		Replies	Views
AutoTokenizer.encode with multiThread and mutliProcess 🤗Tokenizers	2	337	October 9, 2024
Problem with sharing models among processes via multiprocessing Models	0	943	May 11, 2023
Stucked on tokenization before training when using 3 GPU, but not when using 2 GPU Beginners	0	309	June 25, 2023
How to extract pytorch model from transformer pretrained model Models	0	529	August 13, 2022
Distributed inference on multiple files 🤗Transformers	1	1004	January 22, 2023

Problem with torch.multiprocessing and Roberta

Related topics