[Deepspeed] ZeRO-Infinity integration released and config changes

stas · April 26, 2021, 5:50pm

Deepspeed ZeRO-Infinity (deepspeed==0.3.15) has been just integrated. You need to use the transformers master branch to use it.

There are 2 important changes that you need to be aware of if you’re already using DeepSpeed integration in transformers:

After this release only config params that are set to auto will get automatically overriden/set to the correct/recommended values, everything else is left as is. This is to avoid the previously confusing behavior of never being quite sure what gets overridden and what not despite the logger telling what it did override. The new behavior is completely unambiguous.

See examples
- zero2
- zero3
Full doc: Trainer — transformers 4.5.0.dev0 documentation
If you are using massive models and aren’t using example scripts, make sure to read:

Full doc: Trainer — transformers 4.5.0.dev0 documentation

Everything else should work as before or better.

The docs were revamped a lot too - if you find anything unclear or lacking please let me know.

You probably want to install deepspeed master though, since 0.3.15 left some debug prints in-place, which creates a lot of noise, which has been fixed in master. So:

pip install git+https://github.com/microsoft/DeepSpeed

If you encounter any problems please post an Issue and tag @stas00 to it. Thank you!

TOPRAN · April 27, 2021, 12:22pm

Thanks for your work, I tried deepspeed in Wav2vec2-finetune and when I use the configuration file “ds_config_zero2.json”, it reports the following error：

File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 259, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

So I made a change in the function _prepare_input() by using “.half()”：

def _prepare_inputs(self, inputs: Dict[str, Union[torch.Tensor, Any]]) -> Dict[str, Union[torch.Tensor, Any]]:
        """
        Prepare :obj:`inputs` before feeding them to the model, converting them to tensors if they are not already and
        handling potential state.
        """
        # Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
        for k, v in inputs.items():
            if isinstance(v, torch.Tensor):
                # inputs[k] = v.to(self.args.device)
                inputs[k] = v.to(self.args.device).half() # add .half() here

        if self.args.past_index >= 0 and self._past is not None:
            inputs["mems"] = self._past

        return inputs

I don’t know if this is the right way to change it, but then I got a new error:

  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/nn/functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1607370141920/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2ed6c508b2 in /root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f2ed6ea2982 in /root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f2ed6c3bb7d in /root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5fea0a (0x7f2f13f8da0a in /root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5feab6 (0x7f2f13f8dab6 in /root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x1a3f6e (0x55c7aa0a8f6e in /root/anaconda3/envs/huggingface/bin/python)
frame #6: <unknown function> + 0x10e34c (0x55c7aa01334c in /root/anaconda3/envs/huggingface/bin/python)
frame #7: <unknown function> + 0x216141 (0x55c7aa11b141 in /root/anaconda3/envs/huggingface/bin/python)
frame #8: <unknown function> + 0x10e318 (0x55c7aa013318 in /root/anaconda3/envs/huggingface/bin/python)
frame #9: <unknown function> + 0x1a3f50 (0x55c7aa0a8f50 in /root/anaconda3/envs/huggingface/bin/python)
frame #10: <unknown function> + 0x10e34c (0x55c7aa01334c in /root/anaconda3/envs/huggingface/bin/python)
frame #11: <unknown function> + 0x216141 (0x55c7aa11b141 in /root/anaconda3/envs/huggingface/bin/python)
frame #12: <unknown function> + 0x10e3a8 (0x55c7aa0133a8 in /root/anaconda3/envs/huggingface/bin/python)
frame #13: <unknown function> + 0x1a3f50 (0x55c7aa0a8f50 in /root/anaconda3/envs/huggingface/bin/python)
frame #14: <unknown function> + 0x10e34c (0x55c7aa01334c in /root/anaconda3/envs/huggingface/bin/python)
frame #15: <unknown function> + 0x216141 (0x55c7aa11b141 in /root/anaconda3/envs/huggingface/bin/python)
frame #16: <unknown function> + 0x10e3a8 (0x55c7aa0133a8 in /root/anaconda3/envs/huggingface/bin/python)
frame #17: <unknown function> + 0x1a3f50 (0x55c7aa0a8f50 in /root/anaconda3/envs/huggingface/bin/python)
frame #18: <unknown function> + 0x10e34c (0x55c7aa01334c in /root/anaconda3/envs/huggingface/bin/python)
frame #19: <unknown function> + 0x216141 (0x55c7aa11b141 in /root/anaconda3/envs/huggingface/bin/python)
frame #20: <unknown function> + 0x10e3a8 (0x55c7aa0133a8 in /root/anaconda3/envs/huggingface/bin/python)
frame #21: <unknown function> + 0x1a3f50 (0x55c7aa0a8f50 in /root/anaconda3/envs/huggingface/bin/python)
frame #22: <unknown function> + 0x10e34c (0x55c7aa01334c in /root/anaconda3/envs/huggingface/bin/python)
frame #23: <unknown function> + 0x216141 (0x55c7aa11b141 in /root/anaconda3/envs/huggingface/bin/python)
frame #24: <unknown function> + 0x10e3a8 (0x55c7aa0133a8 in /root/anaconda3/envs/huggingface/bin/python)
frame #25: <unknown function> + 0x1a3f50 (0x55c7aa0a8f50 in /root/anaconda3/envs/huggingface/bin/python)
frame #26: <unknown function> + 0x10e34c (0x55c7aa01334c in /root/anaconda3/envs/huggingface/bin/python)
frame #27: <unknown function> + 0x216141 (0x55c7aa11b141 in /root/anaconda3/envs/huggingface/bin/python)
frame #28: <unknown function> + 0x10e3a8 (0x55c7aa0133a8 in /root/anaconda3/envs/huggingface/bin/python)
frame #29: <unknown function> + 0x1a3f50 (0x55c7aa0a8f50 in /root/anaconda3/envs/huggingface/bin/python)
frame #30: <unknown function> + 0x10e318 (0x55c7aa013318 in /root/anaconda3/envs/huggingface/bin/python)
frame #31: <unknown function> + 0x1a3f50 (0x55c7aa0a8f50 in /root/anaconda3/envs/huggingface/bin/python)
frame #32: <unknown function> + 0x10e3a8 (0x55c7aa0133a8 in /root/anaconda3/envs/huggingface/bin/python)
frame #33: <unknown function> + 0x1a3f50 (0x55c7aa0a8f50 in /root/anaconda3/envs/huggingface/bin/python)
frame #34: <unknown function> + 0xfd9c8 (0x55c7aa0029c8 in /root/anaconda3/envs/huggingface/bin/python)
frame #35: <unknown function> + 0x10eb77 (0x55c7aa013b77 in /root/anaconda3/envs/huggingface/bin/python)
frame #36: <unknown function> + 0x10eb8d (0x55c7aa013b8d in /root/anaconda3/envs/huggingface/bin/python)
frame #37: PyDict_SetItem + 0x502 (0x55c7aa068da2 in /root/anaconda3/envs/huggingface/bin/python)
frame #38: PyDict_SetItemString + 0x4f (0x55c7aa06986f in /root/anaconda3/envs/huggingface/bin/python)
frame #39: PyImport_Cleanup + 0xa0 (0x55c7aa0af5d0 in /root/anaconda3/envs/huggingface/bin/python)
frame #40: Py_FinalizeEx + 0x67 (0x55c7aa12a487 in /root/anaconda3/envs/huggingface/bin/python)
frame #41: <unknown function> + 0x237f03 (0x55c7aa13cf03 in /root/anaconda3/envs/huggingface/bin/python)
frame #42: _Py_UnixMain + 0x3c (0x55c7aa13d22c in /root/anaconda3/envs/huggingface/bin/python)
frame #43: __libc_start_main + 0xf5 (0x7f2f4d63d555 in /usr/lib64/libc.so.6)
frame #44: <unknown function> + 0x1dce90 (0x55c7aa0e1e90 in /root/anaconda3/envs/huggingface/bin/python)

/opt/conda/conda-bld/pytorch_1607370141920/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1607370141920/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1607370141920/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1607370141920/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

I also tried using the configuration file "ds_config_zero3.json", and it gives a new error：
nn.functional.linear has been overridden with a more memory efficient version. This will persist unless manually reset.
Traceback (most recent call last):
Traceback (most recent call last):
  File "run_libri960.py", line 633, in <module>
  File "run_libri960.py", line 633, in <module>
    main()
main()  File "run_libri960.py", line 484, in main

  File "run_libri960.py", line 484, in main
    vocab_size=len(processor.tokenizer),vocab_size=len(processor.tokenizer),

  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/modeling_utils.py", line 1131, in from_pretrained
  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/modeling_utils.py", line 1131, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)model = cls(config, *model_args, **model_kwargs)

  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/models/wav2vec2/modeling_wav2vec2.py", line 976, in __init__
  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/models/wav2vec2/modeling_wav2vec2.py", line 976, in __init__
    self.wav2vec2 = Wav2Vec2Model(config)self.wav2vec2 = Wav2Vec2Model(config)
  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/models/wav2vec2/modeling_wav2vec2.py", line 782, in __init__

  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/models/wav2vec2/modeling_wav2vec2.py", line 782, in __init__
self.encoder = Wav2Vec2EncoderStableLayerNorm(config)    self.encoder = Wav2Vec2EncoderStableLayerNorm(config)
  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 197, in wrapper

  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 197, in wrapper
f(module, *args, **kwargs)    
f(module, *args, **kwargs)  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/models/wav2vec2/modeling_wav2vec2.py", line 595, in __init__

  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/models/wav2vec2/modeling_wav2vec2.py", line 595, in __init__
self.pos_conv_embed = Wav2Vec2PositionalConvEmbedding(config)    self.pos_conv_embed = Wav2Vec2PositionalConvEmbedding(config)

  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 197, in wrapper
  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 197, in wrapper
    f(module, *args, **kwargs)f(module, *args, **kwargs)

  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/models/wav2vec2/modeling_wav2vec2.py", line 200, in __init__
  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/transformers-4.6.0.dev0-py3.7.egg/transformers/models/wav2vec2/modeling_wav2vec2.py", line 200, in __init__
self.conv = nn.utils.weight_norm(self.conv, name="weight", dim=2)
  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/nn/utils/weight_norm.py", line 105, in weight_norm
self.conv = nn.utils.weight_norm(self.conv, name="weight", dim=2)
  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/nn/utils/weight_norm.py", line 105, in weight_norm
    WeightNorm.apply(module, name, dim)WeightNorm.apply(module, name, dim)

  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/nn/utils/weight_norm.py", line 44, in apply
  File "/root/anaconda3/envs/huggingface/lib/python3.7/site-packages/torch/nn/utils/weight_norm.py", line 44, in apply
module.register_parameter(name + '_g', Parameter(norm_except_dim(weight, 2, dim).data))
module.register_parameter(name + '_g', Parameter(norm_except_dim(weight, 2, dim).data))
IndexError: IndexErrorDimension out of range (expected to be in range of [-1, 0], but got 2)
: Dimension out of range (expected to be in range of [-1, 0], but got 2)

Here is the command I executed in the terminal:

deepspeed --include=“localhost:3,4” run_libri960.py
–output_dir={output_dir} \ --num_train_epochs="30" \ --deepspeed={ds_config_dir}
–per_device_train_batch_size=“4”
–per_device_eval_batch_size=“4”
–evaluation_strategy=“steps”
–save_total_limit=“3”
–save_steps=“2000”
–eval_steps=“500”
–logging_steps=“50”
–learning_rate=“3e-5”
–warmup_steps=“500”
–model_name_or_path={model_name_or_path} \ --deepspeed={ds_config_dir}
–preprocessing_num_workers=“32”
–group_by_length
–freeze_feature_extractor
–logging_dir=${logging_dir}
–gradient_accumulation_steps=“2”

I’d appreciate it if you could reply to me！
@patrickvonplaten @valhalla
By the way,I tried using the DDP to solve the problem of uneven distribution of memory during multi-GPU training
But I find it more likely to prompt OOM when using DDP, why is that?

stas · April 28, 2021, 3:06am

@TOPRAN, are you the same person as the one who posted this? [wav2vec] deepspeed eval bug in the case of >1 gpus · Issue #11446 · huggingface/transformers · GitHub - as it appears very similar - See my followup in that issue which partially addresses this problem - let’s continue the discussion in the issue.

Topic		Replies	Views
Wav2vec fine-tuning with multiGPU Models	16	6937	May 22, 2021
Issues saving and loading wav2vec2 models fine tuned using Deepspeed DeepSpeed	1	1642	March 3, 2023
Question about using trainer with DeepSpeed 🤗Transformers	0	454	April 25, 2023
2B Model Fill Up Memory Usage on 4xA100s 🤗Transformers	1	101	April 10, 2025
DeepSpeed integration for HuggingFace Seq2SeqTrainingArguments DeepSpeed	0	1490	February 22, 2024

[Deepspeed] ZeRO-Infinity integration released and config changes

Related topics