I want to speed up inference time of my pre-trained model. Here’s how I load the model:
tokenizer = AutoTokenizer.from_pretrained(load_path)
model = AutoModelForSequenceClassification.from_pretrained(load_path, device_map = 'auto')
model = BetterTransformer.transform(model)
Here’s how I run the inference script:
CUDA_VISIBLE_DEVICES=0,1 python model.py
However, using nvidia-smi
, I see only "GPU 0"
is used to load the model, not both 0 and 1.
PS: When I remove CUDA_VISIBLE_DEVICES=0,1
, then I get this error:
ValueError: The device_map provided does not give any device for the following parameters: roberta.encoder.layer.15.in_proj_weight, roberta.encoder.layer.15.in_proj_bias, roberta.encoder.layer.15.out_proj_weight, roberta.encoder.layer.15.out_proj_bias, rob
erta.encoder.layer.15.linear1_weight, roberta.encoder.layer.15.linear1_bias, roberta.encoder.layer.15.linear2_weight, roberta.encoder.layer.15.linear2_bias, roberta.encoder.layer.15.norm1_weight, roberta.encoder.layer.15.norm1_bias, roberta.encoder.layer.1
5.norm2_weight, roberta.encoder.layer.15.norm2_bias