DeepSpeed error: a leaf Variable that requires grad is being used in an in-place operation

I only get this error when I use

 process_group_backend="gloo",