DeepSpeed integration for HuggingFace Seq2SeqTrainingArguments

Hello. I’m doing a classic whisper fine-tune using Huggingface :hugs: libraries. my data is in good format and has already been mapped. I can use whisper-base, but I need to fine-tune on whisper-large-v2. whisper-large-v2 model requires ~24GB of GPU VRAM. I have access to 2 V100 Cards which have 16GB VRAM each, so that’s 32GB, but split between 2 GPUs hence I need the deepspeed integration for multiple GPU deployment.

Here is my Seq2SeqTrainingArguments:

training_args = Seq2SeqTrainingArguments(
output_dir=‘/home/camerono/Cameron-oos/whisper-rxn-tuned’, # change to a repo name of your choice
deepspeed=‘/home/camerono/whisper-finetuning/testEnvFinetuning/ds_config.json’,
per_device_train_batch_size=16,
gradient_accumulation_steps=1, # increase by 2x for every 2x decrease in batch size
learning_rate=1e-5,
warmup_steps=500,
max_steps=4000,
gradient_checkpointing=True,
fp16=True, # Change to false if using CPU only
evaluation_strategy=‘steps’,
per_device_eval_batch_size=8,
predict_with_generate=True,
generation_max_length=225,
save_steps=1000,
eval_steps=1000,
logging_steps=25,
report_to=[“tensorboard”],
load_best_model_at_end=True,
metric_for_best_model=“wer”,
greater_is_better=False,
push_to_hub=True
)

And here is my ds_config.json file contents:
{
“fp16”: {
“enabled”: true
},
“zero_optimization”: {
“stage”: 2,
“allgather_partitions”: true,
“allgather_bucket_size”: 2e8,
“overlap_comm”: true,
“reduce_scatter”: true,
“reduce_bucket_size”: 2e8,
“contiguous_gradients”: true
},
“optimizer”: {
“type”: “AdamW”,
“params”: {
“lr”: 1e-5,
“betas”: [0.9, 0.999],
“eps”: 1e-8,
“weight_decay”: 0.01
}
},
“scheduler”: {
“type”: “WarmupLR”,
“params”: {
“warmup_min_lr”: 0,
“warmup_max_lr”: 1e-5,
“warmup_num_steps”: 500
}
}
}

I do not have a CUDA and Torch mismatch. Im using nvcc version 11.5 and torch 1.11. They are compatible as can be seen here: PyTorch Release 21.12 - NVIDIA Docs.

The error Im getting:
[2024-02-22 10:32:12,164] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Using /home/camerono/.cache/torch_extensions/py310_cu115 as PyTorch extensions root…
Using /home/camerono/.cache/torch_extensions/py310_cu115 as PyTorch extensions root…
Detected CUDA files, patching ldflags
Emitting ninja build file /home/camerono/.cache/torch_extensions/py310_cu115/fused_adam/build.ninja…
Building extension module fused_adam…
Allowing ninja to set a default number of workers… (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include/TH -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include/THC -isystem /home/camerono/.conda/envs/train/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options ‘-fPIC’ -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -std=c++14 -c /home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
FAILED: multi_tensor_adam.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include/TH -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include/THC -isystem /home/camerono/.conda/envs/train/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options ‘-fPIC’ -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -std=c++14 -c /home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘…’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘…’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -I/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include/TH -isystem /home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/include/THC -isystem /home/camerono/.conda/envs/train/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/utils/cpp_extension.py”, line 1740, in _run_ninja_build
subprocess.run(
File “/home/camerono/.conda/envs/train/lib/python3.10/subprocess.py”, line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command ‘[‘ninja’, ‘-v’]’ returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/home/camerono/whisper-finetuning/testEnvFinetuning/testingEnvTwo.py”, line 161, in
trainer.train()
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/transformers/trainer.py”, line 1530, in train
return inner_training_loop(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/transformers/trainer.py”, line 1690, in _inner_training_loop
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/accelerate/accelerator.py”, line 1220, in prepare
result = self._prepare_deepspeed(*args)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/accelerate/accelerator.py”, line 1605, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/init.py”, line 176, in initialize
engine = DeepSpeedEngine(args=args,
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/runtime/engine.py”, line 307, in init
self._configure_optimizer(optimizer, model_parameters)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/runtime/engine.py”, line 1231, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/runtime/engine.py”, line 1308, in _configure_basic_optimizer
optimizer = FusedAdam(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py”, line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py”, line 478, in load
return self.jit_load(verbose)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py”, line 522, in jit_load
op_module = load(name=self.name,
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/utils/cpp_extension.py”, line 1144, in load
return _jit_compile(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/utils/cpp_extension.py”, line 1357, in _jit_compile
_write_ninja_file_and_build_library(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/utils/cpp_extension.py”, line 1469, in _write_ninja_file_and_build_library
_run_ninja_build(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/utils/cpp_extension.py”, line 1756, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension ‘fused_adam’
Loading extension module fused_adam…
Traceback (most recent call last):
File “/home/camerono/whisper-finetuning/testEnvFinetuning/testingEnvTwo.py”, line 161, in
trainer.train()
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/transformers/trainer.py”, line 1530, in train
return inner_training_loop(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/transformers/trainer.py”, line 1690, in _inner_training_loop
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/accelerate/accelerator.py”, line 1220, in prepare
result = self._prepare_deepspeed(*args)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/accelerate/accelerator.py”, line 1605, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/init.py”, line 176, in initialize
engine = DeepSpeedEngine(args=args,
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/runtime/engine.py”, line 307, in init
self._configure_optimizer(optimizer, model_parameters)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/runtime/engine.py”, line 1231, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/runtime/engine.py”, line 1308, in _configure_basic_optimizer
optimizer = FusedAdam(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py”, line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py”, line 478, in load
return self.jit_load(verbose)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py”, line 522, in jit_load
op_module = load(name=self.name,
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/utils/cpp_extension.py”, line 1144, in load
return _jit_compile(
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/utils/cpp_extension.py”, line 1382, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File “/home/camerono/.conda/envs/train/lib/python3.10/site-packages/torch/utils/cpp_extension.py”, line 1775, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File “”, line 571, in module_from_spec
File “”, line 1176, in create_module
File “”, line 241, in _call_with_frames_removed
ImportError: /home/camerono/.cache/torch_extensions/py310_cu115/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory
[2024-02-22 10:32:38,157] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 83715
[2024-02-22 10:32:38,162] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 83716
[2024-02-22 10:32:38,163] [ERROR] [launch.py:322:sigkill_handler] [‘/home/camerono/.conda/envs/train/bin/python’, ‘-u’, ‘testingEnvTwo.py’, ‘–local_rank=1’] exits with return code = 1