[deepspeed] bigscience/T0* multi-gpu text generation

archieAstana · September 8, 2022, 2:29am

Hello,

I am trying to adjust the solution in [[deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO · Issue #15399 · huggingface/transformers · GitHub].
Though my task is along the same lines, it is a little more demanding. And I struggle to make the adjustments.
I would appreciate any help.

The description of the task:

I have a list of csv documents.
list = [doc1,doc2,doc3,…,docN]
Each document contains a dataframe with two columns: dataframe[‘question’] and dataframe[‘context’]. There are around 25 rows in each dataframe.
Without parallelization, I generate the text by using:

for element in list:
    dataframe = pd.read_csv(element)
    for index, row in dataframe.iterrows():
          query_and_docs = "question: {} context: {}".format(row['question'], row['context'])
          model_input = tokenizer(query_and_docs, padding=True, return_tensors="pt")
          generated_answers_encoded = model.generate(input_ids=model_input["input_ids"].to(device),
                                                   attention_mask=model_input["attention_mask"].to(device),
                                                   min_length=200,
                                                   max_length=400,
                                                   do_sample=False, 
                                                   early_stopping=True,
                                                   num_beams=8,
                                                   temperature=1.0,
                                                   top_k=None,
                                                   top_p=None,
                                                   eos_token_id=tokenizer.eos_token_id,
                                                   no_repeat_ngram_size=3,
                                                   num_return_sequences=1)
          Answer = tokenizer.batch_decode(generated_answers_encoded, skip_special_tokens=True,clean_up_tokenization_spaces=True)[0]
          row['answer'] = Answer

Question:

How can I adjust this code for multi-gpu inference?
Would the combination of GPUs and CPUs (let’s say 40 GPU and 40 CPU) be more efficient than just GPUs (40)?

Topic		Replies	Views
Generate text on multiple GPU 🤗Transformers	2	1301	May 10, 2021
Infrence time increase when using multi-GPU DeepSpeed	1	881	November 28, 2023
I have a question about multi-GPU inference DeepSpeed	0	1519	March 9, 2023
Best practices for improving text generation speed? Beginners	0	2759	April 30, 2022
[Help] GPU with query answering 🤗Transformers	0	328	November 25, 2020

[deepspeed] bigscience/T0* multi-gpu text generation

Related topics