I am facing some error, when I SFTTrainer

Imran1 · June 28, 2023, 4:35pm

hy, when i start training they showing the following error.

trainer.train()

here is the error
RuntimeError: unscale_() has already been called on this optimizer since the last update().

Step	Training Loss

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 1>:1                                                                              │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1645 in train                    │
│                                                                                                  │
│   1642 │   │   inner_training_loop = find_executable_batch_size(                                 │
│   1643 │   │   │   self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size  │
│   1644 │   │   )                                                                                 │
│ ❱ 1645 │   │   return inner_training_loop(                                                       │
│   1646 │   │   │   args=args,                                                                    │
│   1647 │   │   │   resume_from_checkpoint=resume_from_checkpoint,                                │
│   1648 │   │   │   trial=trial,                                                                  │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1987 in _inner_training_loop     │
│                                                                                                  │
│   1984 │   │   │   │   │   │   │   │   args.max_grad_norm,                                       │
│   1985 │   │   │   │   │   │   │   )                                                             │
│   1986 │   │   │   │   │   │   else:                                                             │
│ ❱ 1987 │   │   │   │   │   │   │   self.accelerator.clip_grad_norm_(                             │
│   1988 │   │   │   │   │   │   │   │   model.parameters(),                                       │
│   1989 │   │   │   │   │   │   │   │   args.max_grad_norm,                                       │
│   1990 │   │   │   │   │   │   │   )                                                             │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:1893 in clip_grad_norm_        │
│                                                                                                  │
│   1890 │   │   │   # `accelerator.backward(loss)` is doing that automatically. Therefore, its i  │
│   1891 │   │   │   # We cannot return the gradient norm because DeepSpeed does it.               │
│   1892 │   │   │   return None                                                                   │
│ ❱ 1893 │   │   self.unscale_gradients()                                                          │
│   1894 │   │   return torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=norm_type)  │
│   1895 │                                                                                         │
│   1896 │   def clip_grad_value_(self, parameters, clip_value):                                   │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:1856 in unscale_gradients      │
│                                                                                                  │
│   1853 │   │   │   for opt in optimizer:                                                         │
│   1854 │   │   │   │   while isinstance(opt, AcceleratedOptimizer):                              │
│   1855 │   │   │   │   │   opt = opt.optimizer                                                   │
│ ❱ 1856 │   │   │   │   self.scaler.unscale_(opt)                                                 │
│   1857 │                                                                                         │
│   1858 │   def clip_grad_norm_(self, parameters, max_norm, norm_type=2):                         │
│   1859 │   │   """                                                                               │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py:275 in unscale_            │
│                                                                                                  │
│   272 │   │   optimizer_state = self._per_optimizer_states[id(optimizer)]                        │
│   273 │   │                                                                                      │
│   274 │   │   if optimizer_state["stage"] is OptState.UNSCALED:                                  │
│ ❱ 275 │   │   │   raise RuntimeError("unscale_() has already been called on this optimizer sin   │
│   276 │   │   elif optimizer_state["stage"] is OptState.STEPPED:                                 │
│   277 │   │   │   raise RuntimeError("unscale_() is being called after step().")                 │
│   278                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: unscale_() has already been called on this optimizer since the last update().

muellerzr · June 28, 2023, 5:58pm

Try installing from main, should be fixed there:

!pip install git+https://github.com/huggingface/transformers git+https://github.com/huggingface/accelerate

Topic		Replies	Views
Attempting to unscale FP16 gradients 🤗Transformers	3	9026	June 10, 2024
Warning occured when trying to load checkpoint to continue training 🤗Transformers	5	2282	October 13, 2020
AssertionError: Attempted unscale_ but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration 🤗Transformers	0	327	July 19, 2024
Eval Loss spike Seq2seq Trainer Resume from Checkpoint 🤗Transformers	0	526	June 22, 2021
When using SGD: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn 🤗Transformers	0	1913	October 9, 2023

I am facing some error, when I SFTTrainer

Related topics