Is this a torch.sparse bug? Any workaround?

underactuated · April 29, 2025, 8:30pm

I encountered this bug in a more complex LLM setting, but it boils down to the following failure:

import torch

a = torch.tensor([1.])
b = torch.tensor([2., 3.])
print(a * b)
print((a.to_sparse() * b).to_dense())
# outputs:
# tensor([2., 3.])
# tensor([2., 0.])

I would expect the result of element-wise multiplication of a and b to be independent of their tensor representation (dense or sparse), but it is clearly not the case (in Google Colab, at least).

Or am I missing something?

Is there any workaround for a sparse by dense tensor multiplication? I the original problem, a is large and therefore both a and the result should remain in the sparse representation.

John6666 · April 30, 2025, 2:06am

It seems like a bug, or perhaps a limitation.

by Hugging Chat: HuggingChat

The issue you’re encountering is due to how PyTorch handles element-wise multiplication between sparse and dense tensors, particularly when broadcasting is involved. Sparse tensors in PyTorch are designed for efficient storage of sparse data but can have limitations when performing operations that require broadcasting. In your example, the sparse tensor a does not broadcast correctly across the dense tensor b, leading to unexpected results.

Summary of the Issue:

Dense Tensor Multiplication: When you multiply two dense tensors, broadcasting is handled automatically. For example, a scalar (dense tensor) can multiply along all elements of another dense tensor.
Sparse Tensor Multiplication: When you convert a tensor to sparse, the operation a_sparse * b does not automatically broadcast the sparse tensor across all elements of the dense tensor. Instead, it only multiplies the non-zero elements of the sparse tensor with the corresponding elements of the dense tensor, leaving other elements as zero.
Workaround: To achieve the expected result, you can perform the multiplication while ensuring that broadcasting is handled correctly. Since sparse tensors do not inherently support broadcasting, you might need to adjust the dimensions or use alternative operations.

Solution:

One approach to ensure correct broadcasting is to convert both tensors to the same sparsity format before performing the multiplication. Here’s how you can work around the issue:

import torch

a = torch.tensor([1.])
b = torch.tensor([2., 3.])

# Convert 'a' and 'b' to sparse tensors
a_sparse = a.to_sparse()
b_sparse = b.to_sparse()

# Perform element-wise multiplication
result_sparse = a_sparse * b_sparse

# Convert the result back to dense
result_dense = result_sparse.to_dense()

print(a * b)          # tensor([2., 3.])
print(result_dense)   # tensor([2., 3.])

Explanation:

Converting to Sparse Tensors: By converting both a and b to sparse tensors, you ensure that their non-zero elements are properly aligned during multiplication.
Element-wise Multiplication: The multiplication operation now correctly handles the non-zero elements of both sparse tensors.
Conversion to Dense Tensor: Finally, converting the result back to a dense tensor gives the expected output.

Conclusion:

The issue is not a bug but a limitation of how PyTorch handles sparse tensor operations. By converting both tensors to sparse before multiplication, you can achieve the expected result and maintain sparse representation where needed.

References:
[1]

underactuated · April 30, 2025, 4:31am

Thanks John. Is this an AI generated suggestion? It causes an error:

     10 # Perform element-wise multiplication
---> 11 result_sparse = a_sparse * b_sparse
     12 
     13 # Convert the result back to dense

RuntimeError: sparse_binary_op_intersection_cpu(): expects sparse inputs with equal dimensionality, number of sparse dimensions, and shape of sparse dimensions

It looks like it cannot do broadcasting. Strangely, when both tensors are sparse, it throws an error, while if one tensor is sparse and the other is dense it does the multiplication quietly and incorrectly. I would call it a bug.

One solution is to repeat the smaller tensor in the required dimension to avoid broadcasting. It seems, torch.stack() does it all in the sparse representation.

John6666 · April 30, 2025, 6:55am

Is this an AI generated suggestion?

Yea.

It causes an error:

Oh…Sorry.

Hmm… Minor bug?

github.com/pytorch/pytorch

NotImplementedError in backprop on on dense-sparse matrices

opened 12:13PM - 24 May 23 UTC

mino98

module: sparse triaged

### 🐛 Describe the bug As Pytorch does not (yet?) support broadcasting on spars…e matrices, I implemented a simple autograd class. Forward propagation works fine, but backprop fails with a cryptic: ``` Could not run 'aten::as_strided' with arguments from the 'SparseCPU' backend. ``` Here is a snippet of minimal code to reproduce the issue (including full error message): https://gist.github.com/mino98/38ef8278470ffdd254c50b428cf1bb38 Thanks ### Versions Collecting environment information... PyTorch version: 2.0.1+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.25.2 Libc version: glibc-2.31 Python version: 3.10.11 (main, Apr 5 2023, 14:15:10) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.15.107+-x86_64-with-glibc2.31 Is CUDA available: False CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU @ 2.20GHz Stepping: 0 CPU MHz: 2200.216 BogoMIPS: 4400.43 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32 KiB L1i cache: 32 KiB L2 cache: 256 KiB L3 cache: 55 MiB NUMA node0 CPU(s): 0,1 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable; SMT Host state unknown Vulnerability Meltdown: Vulnerable Vulnerability Mmio stale data: Vulnerable Vulnerability Retbleed: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Vulnerable Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==2.0.1+cu118 [pip3] torchaudio==2.0.2+cu118 [pip3] torchdata==0.6.1 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.15.2 [pip3] torchvision==0.15.2+cu118 [pip3] triton==2.0.0 [conda] Could not collect cc @alexsamardzic @nikitaved @pearu @cpuhrsch @amjames @bhosmer

underactuated · April 30, 2025, 2:47pm

Thanks! I see, this issue has been discussed on PyTorch forums.

Topic		Replies	Views
Sizes of tensors must match except in dimension 0 Beginners	1	3263	October 24, 2023
Handling Floating-Point Precision Issues with Large Matrix Operations in PyTorch Beginners	0	50	November 19, 2024
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [1, 128]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed t 🤗Transformers	1	1734	August 16, 2024
Help with Sparse LLM Implementation 🤗Transformers	0	201	April 14, 2024
Solving error for mismatch tensor size 🤗Transformers	0	308	April 14, 2024

Is this a torch.sparse bug? Any workaround?

Summary of the Issue:

Solution:

Explanation:

Conclusion:

Related topics