Hello, I am trying to fine-tune a model on my personal 2GPU machine accelerate framework.
My huggingface accelerate configuration is given below. I am using peft huggingface to finetune AutoModelForCausalLM. However, the model is being loaded on only one GPU and the other GPU isn’t used. The training gives the following error: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 15.73 GiB total capacity; 11.44 GiB already allocated; 54.19 MiB free; 11.44 GiB reserved in total by PyTorch) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 15.73 GiB total capacity; 11.44 GiB already allocated; 54.19 MiB free; 11.44 GiB reserved in total by PyTorch)
- Accelerate version: 0.19.0
- Platform: Linux-5.4.0-146-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Numpy version: 1.24.3
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- System RAM: 251.41 GB
- GPU type: NVIDIA RTX A4000
- Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: DEEPSPEED
- mixed_precision: fp8
- use_cpu: False
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- deepspeed_config: {‘gradient_accumulation_steps’: 1, ‘offload_optimizer_device’: ‘cpu’, ‘offload_param_device’: ‘nvme’, ‘zero3_init_flag’: False, ‘zero_stage’: 2}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: