Explanation of the default "auto" values for DeepSpeed stage 3?

Hi, I would like to know how the default values of the various default DeepSpeed stage 3 parameters were determined when using "auto" fields. They seem to work quite well, but I can’t find any documentation of their origins.

What experiments or computations were done to land on the reduce_bucket_size, stage3_prefetch_bucket_size , and stage3_param_persistence_threshold formulas below?

self.fill_only("zero_optimization.reduce_bucket_size", hidden_size * hidden_size) 
 if self.is_zero3(): 
     # automatically assign the optimal config values based on model config 
     self.fill_only("zero_optimization.stage3_prefetch_bucket_size", 0.9 * hidden_size * hidden_size) 
     self.fill_only("zero_optimization.stage3_param_persistence_threshold", 10 * hidden_size)

Maybe @stas ? I saw that you have worked on much of the deepspeed.py code.

Thank you!

A response can be found here