Quantizing a model on M1 Mac for qlora

This may be more of a PEFT question than a Transformers question-

I’d like to fine-tune a Mistral-7b model on my 32GB M1 Pro MacBook. I’ve found support to do this with MLX in 4-bit qlora but I’d like to stay in the HF + torch ecosystem if possible. In BF16 my machine spends more time handling swap than using the GPU.

BitsandBytes supports 4-bit quantization for models but those layers aren’t supported for non-cuda backends.

Any recommendations on how this could be done? Thanks!