How to load model on multiple GPUs for inference?

Mohammadakhtarsamad · September 28, 2023, 5:48pm

I want to speed up inference time of my pre-trained model. Here’s how I load the model:

 tokenizer = AutoTokenizer.from_pretrained(load_path)
 model = AutoModelForSequenceClassification.from_pretrained(load_path, device_map = 'auto')
 model = BetterTransformer.transform(model)

Here’s how I run the inference script:

 CUDA_VISIBLE_DEVICES=0,1 python model.py

However, using nvidia-smi, I see only "GPU 0" is used to load the model, not both 0 and 1.

PS: When I remove CUDA_VISIBLE_DEVICES=0,1, then I get this error:

 ValueError: The device_map provided does not give any device for the following parameters:      roberta.encoder.layer.15.in_proj_weight, roberta.encoder.layer.15.in_proj_bias, roberta.encoder.layer.15.out_proj_weight, roberta.encoder.layer.15.out_proj_bias, rob
 erta.encoder.layer.15.linear1_weight, roberta.encoder.layer.15.linear1_bias, roberta.encoder.layer.15.linear2_weight, roberta.encoder.layer.15.linear2_bias, roberta.encoder.layer.15.norm1_weight, roberta.encoder.layer.15.norm1_bias, roberta.encoder.layer.1
 5.norm2_weight, roberta.encoder.layer.15.norm2_bias

Topic		Replies	Views
Load a large model to multipe, specific GPUs (without CUDA_VISIBLE_DEVICES) 🤗Transformers	0	171	November 22, 2024
Getting error when running inference in multiple GPUs 🤗Transformers	0	648	October 13, 2023
RuntimeError: Expected all tensors to be on the same device, but found at least two devices Beginners	0	96	November 30, 2024
How to specify the gpu number to load the input during the inference of huggingface pipeline in a multi-gpu setup? 🤗Transformers	2	577	August 8, 2024
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! 🤗Transformers	28	113710	November 17, 2024

How to load model on multiple GPUs for inference?

Related topics