Reproducing the issue from github Deadlock when loading the model in multiprocessing context · Issue #15976 · huggingface/transformers · GitHub
I am using the following snippet
import torch
from pathlib import Path
import multiprocessing as mp
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
queue = mp.Queue()
def load_model(filename):
device = queue.get()
print('Loading')
model = AutoModelForSeq2SeqLM.from_pretrained('models/sqgen').to(device)
print('Loaded')
queue.put(device)
def parallel():
num_gpus = torch.cuda.device_count()
with mp.get_context('spawn').Pool(processes=num_gpus) as pool:
for gpu_id in range(num_gpus):
queue.put('cuda:{0}'.format(gpu_id))
pool = mp.Pool(processes=num_gpus)
flist = list(Path('data').glob('*.json'))
pool.map(
load_model,
flist,
)
pool.close()
pool.join()
if __name__ == '__main__':
parallel()
This just hangs when loading the model. This is minimal example I cooked up to demonstrate the issue.
What I am actually doing is that, I have 16 large files (possibly more) and 8 GPUs, so I am trying to assign each file to a GPU and do the inference in parallel 8 processes at a time to use all GPUs simultaneously.
Why is this issue happening? Why does model loading deadlock?
What’s the right way to do what I want to achieve?