Finetuning GPT2 using Multiple GPU and Trainer

aclifton314 · September 18, 2020, 7:03am

@valhalla
I encountered this error while trying to utilize multiple gpu for training:

Traceback (most recent call last):
  File "run_finetune_gpt2.py", line 158, in <module>
    main()
  File "run_finetune_gpt2.py", line 145, in main
    trainer.train()
  File "/path/to/venvs/my-venv/lib/python3.6/site-packages/transformers/trainer.py", line 499, in train
    tr_loss += self._training_step(model, inputs, optimizer)
  File "/path/to/venvs/my-venv/lib/python3.6/site-packages/transformers/trainer.py", line 622, in _training_step
    outputs = model(**inputs)
  File "/path/to/venvs/my-venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/path/to/venvs/my-venv/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 156, in forward
    return self.gather(outputs, self.output_device)
  File "/path/to/venvs/my-venv/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/path/to/venvs/my-venv/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    res = gather_map(outputs)
  File "/path/to/venvs/my-venv/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/path/to/venvs/my-venv/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/path/to/venvs/my-venv/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
    assert all(map(lambda i: i.is_cuda, inputs))
AssertionError
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb: You can sync this run to the cloud by running:
wandb: wandb sync wandb/dryrun-20200914_134757-1sih3p0q

Any thoughts about how to correct it?

Topic		Replies	Views
Training using multiple GPUs Beginners	20	20149	February 25, 2024
Custom model with two pretrained models fails multi GPU training when using the Trainer 🤗Transformers	0	245	March 2, 2023
How can I use trainer.train() in runpod's multi gpu? 🤗Transformers	0	422	August 4, 2023
Finetuning GPT2 with user defined loss Beginners	56	16116	July 23, 2023
Fine-tunning llama2 with multiple GPU hugging face trainer 🤗Transformers	8	3390	March 7, 2024

Finetuning GPT2 using Multiple GPU and Trainer

Related topics