How to load model on multiple GPUs for inference?

I want to speed up inference time of my pre-trained model. Here’s how I load the model:

 tokenizer = AutoTokenizer.from_pretrained(load_path)
 model = AutoModelForSequenceClassification.from_pretrained(load_path, device_map = 'auto')
 model = BetterTransformer.transform(model)

Here’s how I run the inference script:


However, using nvidia-smi, I see only "GPU 0" is used to load the model, not both 0 and 1.

PS: When I remove CUDA_VISIBLE_DEVICES=0,1, then I get this error:

 ValueError: The device_map provided does not give any device for the following parameters:      roberta.encoder.layer.15.in_proj_weight, roberta.encoder.layer.15.in_proj_bias, roberta.encoder.layer.15.out_proj_weight, roberta.encoder.layer.15.out_proj_bias, rob
 erta.encoder.layer.15.linear1_weight, roberta.encoder.layer.15.linear1_bias, roberta.encoder.layer.15.linear2_weight, roberta.encoder.layer.15.linear2_bias, roberta.encoder.layer.15.norm1_weight, roberta.encoder.layer.15.norm1_bias, roberta.encoder.layer.1
 5.norm2_weight, roberta.encoder.layer.15.norm2_bias
1 Like