[deepspeed] bigscience/T0* multi-gpu text generation

Hello,

I am trying to adjust the solution in [[deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO · Issue #15399 · huggingface/transformers · GitHub].
Though my task is along the same lines, it is a little more demanding. And I struggle to make the adjustments.
I would appreciate any help.

The description of the task:

  1. I have a list of csv documents.
    list = [doc1,doc2,doc3,…,docN]
  2. Each document contains a dataframe with two columns: dataframe[‘question’] and dataframe[‘context’]. There are around 25 rows in each dataframe.
  3. Without parallelization, I generate the text by using:
for element in list:
    dataframe = pd.read_csv(element)
    for index, row in dataframe.iterrows():
          query_and_docs = "question: {} context: {}".format(row['question'], row['context'])
          model_input = tokenizer(query_and_docs, padding=True, return_tensors="pt")
          generated_answers_encoded = model.generate(input_ids=model_input["input_ids"].to(device),
                                                   attention_mask=model_input["attention_mask"].to(device),
                                                   min_length=200,
                                                   max_length=400,
                                                   do_sample=False, 
                                                   early_stopping=True,
                                                   num_beams=8,
                                                   temperature=1.0,
                                                   top_k=None,
                                                   top_p=None,
                                                   eos_token_id=tokenizer.eos_token_id,
                                                   no_repeat_ngram_size=3,
                                                   num_return_sequences=1)
          Answer = tokenizer.batch_decode(generated_answers_encoded, skip_special_tokens=True,clean_up_tokenization_spaces=True)[0]
          row['answer'] = Answer

Question:

  1. How can I adjust this code for multi-gpu inference?
  2. Would the combination of GPUs and CPUs (let’s say 40 GPU and 40 CPU) be more efficient than just GPUs (40)?