I am trying to run a fine-tuning job using accelerator library and I am getting out-of-memory error in a multi-gpu setup.
Code run:
Here I tried to run on 6 A100 GPUs (40GB each)
accelerate launch run_clm_no_trainer.py \
--dataset_name wikitext \
--per_device_train_batch_size 1 \
--per_device_valid_batch_size 1 \
--gradient_accumulation_steps 8 \
--dataset_config_name wikitext-2-raw-v1 \
--model_name_or_path EleutherAI/gpt-j-6b \
--output_dir /tmp/test-clm
Error file:
Updated accelerate env:
Copy-and-paste the text below in your GitHub issue
- Accelerate version: 0.18.0
- Platform: Linux-5.4.0-136-generic-x86_64-with-glibc2.10
- Python version: 3.8.12
- Numpy version: 1.22.2
- PyTorch version (GPU?): 1.13.1+cu116 (True)
- Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: FSDP
- mixed_precision: bf16
- use_cpu: False
- num_processes: 6
- machine_rank: 0
- num_machines: 0
- rdzv_backend: static
- same_network: True
- main_training_function: main
- fsdp_config: {'fsdp_auto_wrap_policy': 'TRANSFORMER_BASED_WRAP', 'fsdp_backward_prefetch_policy': 'BACKWARD_PRE', 'fsdp_offload_params': False, 'fsdp_sharding_strategy': 1, 'fsdp_state_dict_type': 'FULL_STATE_DICT', 'fsdp_transformer_layer_cls_to_wrap': 'GPTJBlock'}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []