I think it’s a real bug, but there doesn’t seem to be an existing issue. Something like this?
**Title:** `AcceleratedScheduler` doesn’t step during gradient accumulation unless `adjust_scheduler=True` is explicitly passed (docs say it should “always step”)
**Summary**
When using gradient accumulation and setting only `num_steps` (via `Accelerator(gradient_accumulation_steps=...)` or `GradientAccumulationPlugin(num_steps=...)`), the scheduler does **not** advance on accumulation micro-steps. This contradicts the docs which state that Accelerate will **always** step the scheduler to account for accumulation. ([Hugging Face](https://huggingface.co/docs/accelerate/en/package_reference/torch_wrappers "DataLoaders, Optimizers, and Schedulers"))
**What actually happens**
* `GradientState.adjust_scheduler` reads from `plugin_kwargs` and falls back to `False` when the key is missing.
* The `Accelerator.gradient_accumulation_steps` setter only injects `{"num_steps": N}` into `GradientState.plugin_kwargs`. It does **not** inject `adjust_scheduler=True`.
* `AcceleratedScheduler.step()` only increments the underlying scheduler’s `_step_count` during accumulation when `GradientState.adjust_scheduler` is `True`. If it’s `False` or missing, no accumulation-time step is recorded. ([gemfury.com](https://gemfury.com/emaballarin/python%3Aaccelerate/-/content/state.py "state.py · emaballarin / accelerate v1.11.0.dev0 - python ..."))
**Why this contradicts docs**
Docs promise: “When performing gradient accumulation scheduler lengths should not be changed accordingly, Accelerate will **always step** the scheduler to account for it.” Defaults for `GradientAccumulationPlugin` also document `adjust_scheduler=True`. Current runtime violates this unless the flag is explicitly provided. ([Hugging Face](https://huggingface.co/docs/accelerate/en/package_reference/torch_wrappers "DataLoaders, Optimizers, and Schedulers"))
**Minimal repro**
# deps: accelerate>=1.11.0, torch
# refs:
# Docs (always step): https://huggingface.co/docs/accelerate/en/package_reference/torch_wrappers
# Plugin defaults: https://huggingface.co/docs/accelerate/en/package_reference/utilities
from accelerate import Accelerator
from accelerate.utils import GradientAccumulationPlugin
import torch
m = torch.nn.Linear(2, 2)
opt = torch.optim.SGD(m.parameters(), lr=1.0)
sched = torch.optim.lr_scheduler.LinearLR(opt, start_factor=1.0, end_factor=0.5, total_iters=8)
# Only num_steps provided → plugin_kwargs == {"num_steps": 2}
acc = Accelerator(gradient_accumulation_plugin=GradientAccumulationPlugin(num_steps=2))
m, opt, sched = acc.prepare(m, opt, sched)
lrs = []
for _ in range(8):
with acc.accumulate(m):
y = m(torch.randn(4, 2)).sum()
acc.backward(y)
opt.step(); opt.zero_grad()
sched.step() # During accumulation, no _step_count increment if adjust_scheduler=False
lrs.append(opt.param_groups[0]["lr"])
print(lrs) # LR decays too slowly vs. per-micro-step stepping
**Expected behavior**
* Scheduler advances on every accumulation micro-step without requiring users to pass `adjust_scheduler=True`, matching the docs and the documented defaults. ([Hugging Face](https://huggingface.co/docs/accelerate/en/package_reference/torch_wrappers "DataLoaders, Optimizers, and Schedulers"))
**Root cause (code-level)**
* `GradientState.adjust_scheduler` → `self.plugin_kwargs.get("adjust_scheduler", False)` defaults to `False` if the key is absent. ([gemfury.com](https://gemfury.com/emaballarin/python%3Aaccelerate/-/content/state.py "state.py · emaballarin / accelerate v1.11.0.dev0 - python ..."))
* `Accelerator.gradient_accumulation_steps` setter updates only `{"num_steps": N}`. ([gemfury.com](https://gemfury.com/emaballarin/python%3Aaccelerate/accelerate-1.12.0.dev0-py3-none-any.whl/content/accelerator.py "accelerator.py · emaballarin / accelerate v1.12.0.dev0 - python package | Gemfury"))
* `AcceleratedScheduler.step()` gates accumulation-time `_step_count` bump on `gradient_state.adjust_scheduler`. ([gemfury.com](https://gemfury.com/emaballarin/python%3Aaccelerate/accelerate-1.12.0.dev0-py3-none-any.whl/content/scheduler.py "scheduler.py · emaballarin / accelerate v1.12.0.dev0 - python package | Gemfury"))
**Proposed fixes**
Any one of these aligns runtime with docs and keeps explicit overrides working:
1. Make the property default `True`:
# src/accelerate/state.py
- return self.plugin_kwargs.get("adjust_scheduler", False)
+ return self.plugin_kwargs.get("adjust_scheduler", True)
(Respects explicit `adjust_scheduler=False`, fixes implicit cases.) ([gemfury.com](https://gemfury.com/emaballarin/python%3Aaccelerate/-/content/state.py "state.py · emaballarin / accelerate v1.11.0.dev0 - python ..."))
2. Materialize documented plugin defaults when no plugin kwargs exist:
# in GradientState.__init__
- self.plugin_kwargs = (gap.to_kwargs() if gap is not None else {})
+ self.plugin_kwargs = (gap.to_kwargs() if gap is not None else {"adjust_scheduler": True, "sync_with_dataloader": True})
Ensures keys exist and match docs by default. ([Hugging Face](https://huggingface.co/docs/accelerate/package_reference/utilities "Utility functions and classes"))
3. When users set `Accelerator(gradient_accumulation_steps=N)`, also inject the documented defaults:
# accelerator.py gradient_accumulation_steps.setter
- self.gradient_state.plugin_kwargs.update({"num_steps": gradient_accumulation_steps})
+ self.gradient_state.plugin_kwargs.update({
+ "num_steps": gradient_accumulation_steps,
+ "adjust_scheduler": True,
+ "sync_with_dataloader": True
+ })
Keeps behavior consistent whether users pass a plugin or a bare integer. ([gemfury.com](https://gemfury.com/emaballarin/python%3Aaccelerate/accelerate-1.12.0.dev0-py3-none-any.whl/content/accelerator.py "accelerator.py · emaballarin / accelerate v1.12.0.dev0 - python package | Gemfury"))
**Workaround for users**
Pass the flag explicitly:
GradientAccumulationPlugin(num_steps=2, adjust_scheduler=True)
This forces correct stepping today and matches the docs’ stated default. ([Hugging Face](https://huggingface.co/docs/accelerate/package_reference/utilities "Utility functions and classes"))
**Environment**
Reproduced with current stable docs and code paths; behavior visible with `accelerate==1.11.x` and main, as of 2025-11-07 JST. Key code paths cited above. ([Hugging Face](https://huggingface.co/docs/accelerate/en/package_reference/torch_wrappers "DataLoaders, Optimizers, and Schedulers"))
**Related but different**
* Prior reports discuss the opposite symptom (“scheduler always steps under accumulation”). Useful historical context but not this bug. ([github.com](https://github.com/huggingface/accelerate/issues/963 "Scheduler always steps when training with gradient ..."))