How to perform parallel inference using multiple GPU

meetn · April 10, 2024, 9:09am

Hi, is there a way to create an instance of LLM and load that model into two different GPUs? Note that the instance will be created in two different celery tasks (asynchronous task/job)

swtb · April 10, 2024, 10:54am

Distributed inference with multiple GPUs (huggingface.co)

meetn · April 10, 2024, 1:02pm

I went through the documentation, but I still don’t know, how exactly will I be able to handle both the response from the “result”?

Topic		Replies	Views
Running ASR inference pipeline on multiple GPU's 🤗Transformers	0	133	February 19, 2024
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9665	October 16, 2024
Using model.generate() in parrellel / faster? Beginners	0	363	October 11, 2023
Any good code/tutorial that is shows how to do inference with Llama 2 70b on multiple GPUs with accelerate? 🤗Accelerate	1	2782	November 27, 2023
Multi-GPU LLM inference data parallelism (llama) Beginners	1	14258	October 25, 2023

How to perform parallel inference using multiple GPU

Related topics