Deepspeed integration with Trainer in Colab crashing: TypeError: object.__init__() takes exactly one argument (the instance to initialize)

Using hugging face trainer, I am training models in Colab notebook with no problems. I now need to use deepspeed since I’m running out of memory. Deepspeed was installed without any problems using pip install deepspeed (Torch 1.13) already installed. When I run !ds_report in the notebook, all looks good.

However, when I add deepspeed=ds_config_dict to the end of my TrainingArguments, it crashes with the following:

— START crash details —
/usr/local/lib/python3.8/dist-packages/transformers/deepspeed.py in init(self, config_file_or_dict)
65 dep_version_check(“accelerate”)
66 dep_version_check(“deepspeed”)
—> 67 super().init(config_file_or_dict)
68

TypeError: object.init() takes exactly one argument (the instance to initialize)
— END crash details —

I’ve tried lots of combinations i.e. different config dicts and config stored in a json file on disk. I also looked for solutions online but I haven’t come across the problem. Any help appreciated. Thanks.

The following is the cell from my notebook.

ds_config_dict = {
“zero_optimization”: {
“stage”: 2,
“offload_optimizer”: {
“device”: “cpu”,
“pin_memory”: True
},
“allgather_partitions”: True,
“allgather_bucket_size”: 2e8,
“reduce_scatter”: True,
“reduce_bucket_size”: 2e8,
“overlap_comm”: True,
“contiguous_gradients”: True
}
}
BS = 10
GRAD_ACC = 2
LR = 5e-5
WD = 0.01
WARMUP = 0.1
N_EPOCHS = 5
model_name = model_checkpoint.split(“/”)[-1]
!echo $ds_config_dict
args = TrainingArguments(
f"{model_name}-finetuned-{source_lang}-to-{target_lang}",
evaluation_strategy = “epoch”,
logging_strategy = “epoch”,
save_strategy = “epoch”,
learning_rate=LR,
per_device_train_batch_size=BS,
per_device_eval_batch_size=BS,
num_train_epochs=N_EPOCHS,
weight_decay=WD,

report_to=‘wandb’,

gradient_accumulation_steps=GRAD_ACC,
warmup_ratio=WARMUP,
fp16 = True,

deepspeed=ds_config_dict

)

Needed the following and it’s working now:

DeepSpeed requires a distributed environment even when only one process is used.

This emulates a launcher in the notebook

import os

os.environ[“MASTER_ADDR”] = “localhost”

os.environ[“MASTER_PORT”] = “9994” # modify if RuntimeError: Address already in use

os.environ[“RANK”] = “0”

os.environ[“LOCAL_RANK”] = “0”

os.environ[“WORLD_SIZE”] = “1”

1 Like

How to setup deepspeed in a multi gpu system? I use runpod instance with 8 A6000 GPUs and i get the same error