The PR #130377 you linked was already merged on July 10, 2024 and is included in PyTorch 2.4.0+. However, it fixes a different issue (zero-size buffer copies)
There is a different MPS bug:
Compute Function(sub_dense_scalar_long_long): Read-only bytes are being bound at index 2 to a shader argument with write access enabled
This appears to be a CSM-specific issue with certain Metal shader operations that still isn’t fixed.