In the inference tutorial: Getting Started with DeepSpeed for Inferencing Transformer based Models - DeepSpeed , for this example:
# Filename: gpt-neo-2.7b-generation.py
import os
import deepspeed
import torch
from transformers import pipeline
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B',
device=local_rank)
generator.model = deepspeed.init_inference(generator.model,
mp_size=world_size,
dtype=torch.float,
replace_with_kernel_inject=True)
string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(string)
I want to know:
- if I’m using deepspeed --num_gpus 2 gpt-neo-2.7b-generation.py, the “generator” statement runs once or twice?
- should I do something necessary for different rank of machine?