Are the performance tricks from v4.18.0 relocated in the main branch site?

Was wondering about what happened to the tips/tricks recommended on Performance and Scalability: How To Fit a Bigger Model and Train It Faster since it’s replace by a more comprehensive page on Performance and Scalability but didn’t find the same tricks:

  • adafactor
  • 8bit adam
  • gradient checkpointing
  • accumulate gradients

Is the tips/tricks on Performance and Scalability: How To Fit a Bigger Model and Train It Faster still relevant with the latest version of transformers?

Or has it been moved to another location of the main branch site?

cc @lvwerra and @stas who reorganized this content.

hi @alvations, these tricks are now in the one-gpu section: Efficient Training on a Single GPU

2 Likes

Thanks @lvwerra for the pointers to the subpage!