Parallelise pipelines on a single GPU?

Sarwg · October 29, 2024, 9:11am

After conducting some tests, we can run parallel pipelines on a single GPU. However, we need to initialize and store more than one model in the GPU. If we try to use the same model with two different processes at the same time, the results may not be accurate, and the model won’t know how to respond. The code I am using is:

def compute(pipe):
    res = pipe.inference("test.wav","")
    print(res)

if __name__ == '__main__':
    pipe = whisperModel('openai/whisper-tiny','cuda',torch.float16)
    print(pipe.inference("test.wav",""))
    p = Pool(2)
    with p:
        p.map(compute,[pipe,pipe])

the first result is :
Alors ici la fois, j’aime mon travail, c’est comme ça, si la vie on aime tous notre travail, oui, dites-moi un as en plus, je vous remercie.
but the parallelism result are :
!!!
and
!!!
But if you init 2 differente pipeline and you do inference at the same time on each of them in the same GPU it works.

Topic		Replies	Views
Running ASR inference pipeline on multiple GPU's 🤗Transformers	0	137	February 19, 2024
Pipeline inference with multi gpus Beginners	0	620	March 13, 2022
Deploying Whisper Based Live Transcription for 1000 Concurrent users Intermediate	0	379	June 1, 2024
Distributed inference on multiple files 🤗Transformers	1	1004	January 22, 2023
When I try to inference on multiple GPUs using multiple processes, the time for model. generate() becomes very long 🤗Transformers	0	480	June 12, 2023

Parallelise pipelines on a single GPU?

Related topics