How can i use SFTTrainer to leverage all GPUs automatically? If I add device_map=“auto” I get a Cuda out of memory exception. I although I have 4x Nvidia T4 GPUs
Cuda is installed and my environment can see the available GPUs.
How can i use SFTTrainer to leverage all GPUs automatically? If I add device_map=“auto” I get a Cuda out of memory exception. I although I have 4x Nvidia T4 GPUs
Cuda is installed and my environment can see the available GPUs.
Hi,
I would recommend to take a look at the example scripts here: alignment-handbook/scripts at main · huggingface/alignment-handbook · GitHub. It includes scripts which can be run with ZeRO-3 on 8 GPUs. For that, they define an Accelerate config. Basically you need to run the script with accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml
(in the YAML you define on what hardware the script should be run).
The problem is I need to run it via python because I’m using vertex ai pipelines for MLops. Why can’t I avoid calling accelerate on my script via bash?
Is using Optimum a better solution if I want to run it via python without cli launch like you mentioned?
Same issue.
@nielsr What when we have only python notebook access?