Code RuntimeError:Multi-card operation

zhangzhang9999 · October 29, 2023, 8:49am

Accelerate version: 0.24.0
Platform: Linux-5.4.0-150-generic-x86_64-with-glibc2.27
Python version: 3.10.13
Numpy version: 1.26.1
PyTorch version (GPU?): 2.1.0+cu121 (True)
PyTorch XPU available: False
PyTorch NPU available: False
System RAM: 1007.80 GB
GPU type: NVIDIA A100 80GB PCIe
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- gpu_ids: 1,2,3,4
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env:
- dynamo_config: {‘dynamo_backend’: ‘INDUCTOR’, ‘dynamo_mode’: ‘default’, ‘dynamo_use_dynamic’: True, ‘dynamo_use_fullgraph’: False}

Run command as：CCL_P2P_DISABLE=1 CUDA_LAUNCH_BLOCKING=1 accelerate launch 01.py --max_memory_per_gpu 20GB
ERROR： cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

…/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [23,0,0], thread: [32,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
…/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [23,0,0], thread: [33,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
…/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [23,0,0], thread: [34,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
…/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [23,0,0], thread: [35,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
…/aten/src/ATen/native/cuda/IndexKernel.c

ofir408 · March 9, 2024, 12:20am

@zhangzhang9999 Hi, I have the same error. How did you solve it?

Topic		Replies	Views
Accelerate multi-gpu error: Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" 🤗Accelerate	0	439	March 8, 2024
RuntimeError: CUDA error: device-side assert triggered 4x10 🤗Transformers	0	177	April 11, 2024
RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CU 🤗Transformers	2	1118	November 1, 2024
Tokenizer setting for model = LlamaForCausalLM.from_pretrained(model_path, device_map='auto') Models	0	1127	August 25, 2023
Multi-GPU Training sometimes working with 2GPU, but never more than 2 🤗Accelerate	5	3013	August 8, 2024

Code RuntimeError:Multi-card operation

Related topics