Having trouble accelerate on my 2 GPU machine

PopsFun · May 24, 2023, 11:25am

Hello, I am trying to fine-tune a model on my personal 2GPU machine accelerate framework.
My huggingface accelerate configuration is given below. I am using peft huggingface to finetune AutoModelForCausalLM. However, the model is being loaded on only one GPU and the other GPU isn’t used. The training gives the following error: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 15.73 GiB total capacity; 11.44 GiB already allocated; 54.19 MiB free; 11.44 GiB reserved in total by PyTorch) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 15.73 GiB total capacity; 11.44 GiB already allocated; 54.19 MiB free; 11.44 GiB reserved in total by PyTorch)

Accelerate version: 0.19.0
Platform: Linux-5.4.0-146-generic-x86_64-with-glibc2.29
Python version: 3.8.10
Numpy version: 1.24.3
PyTorch version (GPU?): 2.0.1+cu117 (True)
System RAM: 251.41 GB
GPU type: NVIDIA RTX A4000
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: DEEPSPEED
- mixed_precision: fp8
- use_cpu: False
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- deepspeed_config: {‘gradient_accumulation_steps’: 1, ‘offload_optimizer_device’: ‘cpu’, ‘offload_param_device’: ‘nvme’, ‘zero3_init_flag’: False, ‘zero_stage’: 2}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env:

Topic		Replies	Views
Multi-gpu batch processing fails when using Peft Lora with Huggingface Intermediate	1	1290	March 8, 2024
Accelerate on single GPU doesnt seem to work Beginners	2	5480	May 16, 2023
Unable to load a FineTuned LLama Model to GPU for inference Beginners	3	2973	December 15, 2023
Using Huggingface-Trainer with 2 GPUs (Endless Loop) Beginners	0	293	October 10, 2023
Training llama2-13b-16k model with peft on 3 A100 of 80GB is still throwing cuda out of memory 🤗Accelerate	0	790	October 16, 2023

Having trouble accelerate on my 2 GPU machine

Related topics