Optimum Pruning and Quantization Current Limitation

We are checking out the Huggingface Optimum. There are some issues that we would like to clarify:

  • Pruning does not always speedup model, and it may increase the model’s storage size which is not expected.

  • Dynamic quantization works only on CPU (Running it on GPU shows error conflict between CPU and GPU

Could someone or developer in the area explain this behavior? We think the Huggingface Optimum has a high hope for model compression.

If some details are necessary, I would be glad to clarify more.

Hi @samuelmat19

Pruning does not always speedup model, and it may increase the model’s storage size which is not expected.

Currently, the supported pruning method is a magnitude based unstructured method and results in the pruned weights corresponding values to be replaced by 0. The model size should thus not vary and no speed up should be expected.

Dynamic quantization works only on CPU (Running it on GPU shows error conflict between CPU and GPU

Concerning dynamic quantization, unfortunately at the moment PyTorch doesn’t provide quantized operator implementations on CUDA (only CPU backends are available).

Hi @echarlaix

That is clear to me. I also opened an issue on Github, in which it was clarified.

Would it make sense to add some writings in the Optimum’s documentation, which emphasizes the fact that this pruning does not provide speed up or reduction in model size? I could imagine there are some people that are trying to use the pruning for speed up, and would be perplexed when it provides no speed up. This happened to me, and spent some time to figure out what I did wrong.

Yes and as stated in the issue you are referring to, we are planning to add additional pruning methods in the future, bear in mind that this is still a work in progress. We will also make our documentation more detailed in order to make things more understandable for the user.

@echarlaix that is lovely and I appreciate the work being put here in model optimization. Also, if there is something I can contribute in the library, do let me know, I would be glad to help.

Cheers.