Running inference on tuned models across multiple gpus

ericp · August 8, 2024, 1:52am

Im having a tough time running my tuned model across multiple gpus

I have various pt files that i tuned with torchtune. I can inference with their generate function on lora but not full precision as one of my cards cant hold the whole model. Wondering the right approach to do this I have tried various methods but am struggling>

hf_model_0001_2.pt
hf_model_0002_2.pt
hf_model_0003_2.pt
hf_model_0004_2.pt

Topic		Replies	Views
Running ASR inference pipeline on multiple GPU's 🤗Transformers	0	137	February 19, 2024
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9738	October 16, 2024
How to run inference on multigpus 🤗Accelerate	0	146	November 29, 2024
How to generate on multiple GPU's Intermediate	3	1873	August 30, 2022
Tensor parallelism for customized model 🤗Accelerate	0	239	September 2, 2024

Running inference on tuned models across multiple gpus

Related topics