Hey everyone! I hope you’re all having a great day.
So, I was messing around with this AutoGPTQ library and trying to quantize a codeLLM model (from the bigcode family). But I’ve got a couple of questions:
- When should you usually quantize a model, before or after the fine-tuning stage?
- Any particular reason why it’s better to do it before/after the fine-tuning?
Thanks!