Was wondering about what happened to the tips/tricks recommended on Performance and Scalability: How To Fit a Bigger Model and Train It Faster since it’s replace by a more comprehensive page on Performance and Scalability but didn’t find the same tricks:
- adafactor
- 8bit adam
- gradient checkpointing
- accumulate gradients
Is the tips/tricks on Performance and Scalability: How To Fit a Bigger Model and Train It Faster still relevant with the latest version of transformers
?
Or has it been moved to another location of the main branch site?