Ok, I ran that and answered the questions, and restarted and re-ran but still got the SIGSEV error. Here’s the new accelerate env
output:
- `Accelerate` version: 0.9.0
- Platform: Linux-5.4.0-1060-aws-x86_64-with-glibc2.27
- Python version: 3.9.9
- Numpy version: 1.22.4
- PyTorch version (GPU?): 1.11.0+cu115 (True)
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: fp16
- use_cpu: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- main_process_ip: None
- main_process_port: None
- main_training_function: main
- deepspeed_config: {}
- fsdp_config: {}
Update: tried reconfiguring with FSDP on (and then chose the defaults presented thereafer), restarting, and re-running. Same SIGSEV error.