PEFT vram usage

Hi,

So im trying out peft and seemingly vram usage is excessive, for instance i am executing the peft_lora_clm_accelerate_ds_zero3_offload.py example on “bigscience/bloomz-3b” with the provided accelerate configuration accelerate_ds_zero3_cpu_offload_config.yaml on a system with 3x AMD Radeon Instinct MI50 accelerators with 16GB vram eatch like this:

accelerate launch --num_processes=3 --config_file=accelerate_ds_zero3_cpu_offload_config.yaml ./peft_lora_clm_accelerate_ds_zero3_offload.py

this very quickly fills the vram of all three cards and then ooms

I have also written some custom training scripts that run into the same problem.

This feels very wrong. Im unsure how to debug this/ profile the vram usage of the peft/transformers stack.