Why activations memory is computed through an experiment rather formulating it for DeepSpeed autotuner

kmehant · May 6, 2024, 6:00am

I am trying to understand the reasons behind the design choice of computing activations memory through running an experiment rather formulating it in deepspeed autotuner.

Topic		Replies	Views
Fine-tuning T5 with long sequence length, using activation checkpointing with Deepspeed 🤗Transformers	6	2873	December 5, 2022
Conceptual question: Early loading of the model defeats the purpose of deepspeed! DeepSpeed	0	158	March 14, 2024
Does Trainer hyperparameter search support deepspeed? Beginners	0	215	July 10, 2023
Accelerate DeepSpeed integration vs DeepSpeed 🤗Accelerate	1	224	April 15, 2024
Calculate tokens per second while fine-tuning llm? DeepSpeed	0	126	September 17, 2024

Why activations memory is computed through an experiment rather formulating it for DeepSpeed autotuner

Related topics