2B Model Fill Up Memory Usage on 4xA100s

Could it be that DeepSpeed has a bug…?