Hello,
I am trying to adjust the solution in [[deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO · Issue #15399 · huggingface/transformers · GitHub].
Though my task is along the same lines, it is a little more demanding. And I struggle to make the adjustments.
I would appreciate any help.
The description of the task:
- I have a list of csv documents.
list = [doc1,doc2,doc3,…,docN] - Each document contains a dataframe with two columns: dataframe[‘question’] and dataframe[‘context’]. There are around 25 rows in each dataframe.
- Without parallelization, I generate the text by using:
for element in list:
dataframe = pd.read_csv(element)
for index, row in dataframe.iterrows():
query_and_docs = "question: {} context: {}".format(row['question'], row['context'])
model_input = tokenizer(query_and_docs, padding=True, return_tensors="pt")
generated_answers_encoded = model.generate(input_ids=model_input["input_ids"].to(device),
attention_mask=model_input["attention_mask"].to(device),
min_length=200,
max_length=400,
do_sample=False,
early_stopping=True,
num_beams=8,
temperature=1.0,
top_k=None,
top_p=None,
eos_token_id=tokenizer.eos_token_id,
no_repeat_ngram_size=3,
num_return_sequences=1)
Answer = tokenizer.batch_decode(generated_answers_encoded, skip_special_tokens=True,clean_up_tokenization_spaces=True)[0]
row['answer'] = Answer
Question:
- How can I adjust this code for multi-gpu inference?
- Would the combination of GPUs and CPUs (let’s say 40 GPU and 40 CPU) be more efficient than just GPUs (40)?