Local variable 'gradient_accumulation_steps' referenced before assignment

Trying to fine tune llama-7b following this tutorial (GPT4ALL: Train with local data for Fine-tuning | by Mark Zhou | Medium).

My accelerate configuration:

$ accelerate env
[2023-08-20 19:22:40,268] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)

Copy-and-paste the text below in your GitHub issue

- `Accelerate` version: 0.21.0
- Platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.10.12
- Numpy version: 1.25.2
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 15.49 GB
- GPU type: NVIDIA GeForce RTX 4070 Laptop GPU
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: DEEPSPEED
        - mixed_precision: fp16
        - use_cpu: True
        - num_processes: 1
        - machine_rank: 0
        - num_machines: 1
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - deepspeed_config: {'gradient_accumulation_steps': 1, 'gradient_clipping': 1.0, 'offload_optimizer_device': 'cpu', 'offload_param_device': 'cpu', 'zero3_init_flag': False, 'zero_stage': 0}
        - ipex_config: {'ipex': True}
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

However, every time I try training I get this error:

accelerate launch --cpu --num_processes=8 --num_machines=1 --machine_rank=0 train.py --config configs/train/finetune_frog.yaml
[2023-08-20 19:24:08,017] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance when training on CPUs
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2023-08-20 19:24:11,530] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
wandb: Currently logged in as: wung8. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.8
wandb: Run data is saved locally in /mnt/c/Users/micha/Documents/CS Projects/frothy_mug/gpt4all/gpt4all-training/wandb/run-20230820_192415-wtr65d09
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run spring-firebrand-34
wandb: ⭐️ View project at https://wandb.ai/wung8/gpt4all-gpt4all-training
wandb: 🚀 View run at https://wandb.ai/wung8/gpt4all-gpt4all-training/runs/wtr65d09
{'model_name': 'decapoda-research/llama-7b-hf', 'tokenizer_name': 'gpt2', 'gradient_checkpointing': True, 'save_name': 'nomic-ai/gpt4all-full-multi-turn', 'streaming': False, 'num_proc': 64, 'dataset_path': 'training.jsonl', 'max_length': 1024, 'batch_size': 1, 'lr': 2e-05, 'min_lr': 0, 'weight_decay': 0.0, 'eval_every': 500, 'eval_steps': 105, 'save_every': 500, 'log_grads_every': 100, 'output_dir': 'model_data/gpt4all_frog', 'checkpoint': None, 'lora': False, 'warmup_steps': 500, 'num_epochs': 2, 'wandb': True, 'wandb_entity': None, 'wandb_project_name': None, 'seed': 42}
Using 1 GPUs
Using pad_token, but it is not set yet.
Reading files ['training.jsonl']
num_proc must be <= 21. Reducing num_proc to 21 for dataset of size 21.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 33/33 [00:48<00:00,  1.47s/it]Len of train_dataloader: 387
Traceback (most recent call last):
  File "/mnt/c/Users/micha/Documents/CS Projects/frothy_mug/gpt4all/gpt4all-training/train.py", line 241, in <module>
    train(accelerator, config=config)
  File "/mnt/c/Users/micha/Documents/CS Projects/frothy_mug/gpt4all/gpt4all-training/train.py", line 91, in train
    total_num_steps = (len(train_dataloader) / gradient_accumulation_steps) * (config["num_epochs"])
UnboundLocalError: local variable 'gradient_accumulation_steps' referenced before assignment
Traceback (most recent call last):
  File "/mnt/c/Users/micha/Documents/CS Projects/frothy_mug/gpt4all/gpt4all-training/train.py", line 241, in <module>
    train(accelerator, config=config)
  File "/mnt/c/Users/micha/Documents/CS Projects/frothy_mug/gpt4all/gpt4all-training/train.py", line 91, in train
    total_num_steps = (len(train_dataloader) / gradient_accumulation_steps) * (config["num_epochs"])
UnboundLocalError: local variable 'gradient_accumulation_steps' referenced before assignment

I’ve tried adding --config (path) and --gradient accumulation steps=1 but I get the same result.

From what I can tell it’s an issue with the deepspeed plugin which for some reason doesn’t exist.

>>> from accelerate import Accelerator
>>> accelerator = Accelerator()
>>> str(accelerator.state.deepspeed_plugin)
'None'

Is it possible that this is just a Windows issue or have I forgotten to do something?