Inferences with DataParallel

crsegerie · February 15, 2022, 11:46pm

tldr: This is an attempt at using DataParallel class with Huggingface, But I still can’t figure it out. Could you give me some examples ?

Hello, I would like to use my two GPU to make inferences with DataParallel. So I adapted a script which works well on one gpu, but I’m stuck with an error:

from torch.nn.parallel import DataParallel
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM  # type: ignore


device = "cuda:0"
tokenizer = AutoTokenizer.from_pretrained("sshleifer/distill-pegasus-cnn-16-4")
model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distill-pegasus-cnn-16-4")

model = DataParallel(model, device_ids=list(range(torch.cuda.device_count())))
model.to(device=device)

batch = tokenizer(
      df_chunk["abstract"].to_list(),
      truncation=True,
      padding="longest",
      max_length=max_length_abstract,
      return_tensors="pt",
  ).to(device)
  translated = model(**batch) # <--- error here
  translated = model(decoder_input_ids=batch["input_ids"], **batch)
  print("len token batch", batch["input_ids"].shape)
  tldrs = tokenizer.batch_decode(translated, skip_special_tokens=True)

‘DataParallel’ object has no attribute ‘generate’.

So I replaced the faulty line by the following line using the call method of PyTorch models :

translated = model(**batch) but now I get the following error:

error

ValueError: Caught ValueError in replica 0 on device 0.
Original Traceback (most recent call last):
File “/scratch/scampo01/condaenvs/torchA100_v2/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py”, line 61, in _worker
output = module(*input, **kwargs)
File “/scratch/scampo01/condaenvs/torchA100_v2/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1102, in _call_impl
return forward_call(*input, **kwargs)
File “/scratch/scampo01/condaenvs/torchA100_v2/lib/python3.9/site-packages/transformers/models/pegasus/modeling_pegasus.py”, line 1390, in forward
outputs = self.model(
File “/scratch/scampo01/condaenvs/torchA100_v2/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1102, in _call_impl
return forward_call(*input, **kwargs)
File “/scratch/scampo01/condaenvs/torchA100_v2/lib/python3.9/site-packages/transformers/models/pegasus/modeling_pegasus.py”, line 1250, in forward
decoder_outputs = self.decoder(
File “/scratch/scampo01/condaenvs/torchA100_v2/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/scampo01/condaenvs/torchA100_v2/lib/python3.9/site-

packages/transformers/models/pegasus/modeling_pegasus.py", line 1014, in forward
raise ValueError(“You have to specify either decoder_input_ids or decoder_inputs_embeds”)
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

So finally I replaced the faulty line with

translated = model(decoder_input_ids=batch["input_ids"], **batch)

And this seems to work, until the tokenizer cannot decode the outputs and I have to modify the end of the script like that to obtain something:

    tldrs = tokenizer.batch_decode(translated, skip_special_tokens=True)
    tldrs = tokenizer.batch_decode(
        torch.argmax(translated.logits, dim=2), skip_special_tokens=True
    )

But my results and this code is so messy that it is obvious that I missed something. I didn’t find many (any?) examples on how to use DataParallel with Huggingface models for inferences. Could you give me some pointers and tell me what is wrong in my script?

Thank you

PahaII · May 19, 2022, 1:07am

I happen to come across the same question. Did you find a more elegant way to solve it? It seems that if you replace model.generate(batch["input_ids"]) with model(decoder_input_ids=batch["input_ids"],**batch) and tldrs = tokenizer.batch_decode(torch.argmax(translated.logits, dim=2)), then you are performing argmax decoding. You cannot directly use packaged decoding strategy such as beam search in model.generate(), which is not convenient…

liorcam · May 31, 2022, 2:55pm

since the the model variable is a DataParrallel instance, it attributes are accessible with model.module
have you tried
translated = model.module(**batch)
?

nomadrp · March 15, 2024, 3:23pm

accelerator.unwrap_model(model) worked instead of model.module.

Topic		Replies	Views
Multi-GPU LLM inference data parallelism (llama) Beginners	1	14130	October 25, 2023
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9605	October 16, 2024
Multi gpu training 🤗Transformers	3	6013	April 24, 2022
Help for inference.py code Amazon SageMaker	10	3993	March 8, 2022
Multi-GPU inference with Luke NER not working 🤗Transformers	0	447	November 10, 2022

Inferences with DataParallel

Related topics