Hi, I would like to know how the default values of the various default DeepSpeed stage 3 parameters were determined when using "auto"
fields. They seem to work quite well, but I can’t find any documentation of their origins.
What experiments or computations were done to land on the reduce_bucket_size
, stage3_prefetch_bucket_size
, and stage3_param_persistence_threshold
formulas below?
self.fill_only("zero_optimization.reduce_bucket_size", hidden_size * hidden_size)
if self.is_zero3():
# automatically assign the optimal config values based on model config
self.fill_only("zero_optimization.stage3_prefetch_bucket_size", 0.9 * hidden_size * hidden_size)
self.fill_only("zero_optimization.stage3_param_persistence_threshold", 10 * hidden_size)
Maybe @stas ? I saw that you have worked on much of the deepspeed.py
code.
Thank you!