Quantizing a model on M1 Mac for qlora

jkunstle · March 14, 2024, 4:21am

This may be more of a PEFT question than a Transformers question-

I’d like to fine-tune a Mistral-7b model on my 32GB M1 Pro MacBook. I’ve found support to do this with MLX in 4-bit qlora but I’d like to stay in the HF + torch ecosystem if possible. In BF16 my machine spends more time handling swap than using the GPU.

BitsandBytes supports 4-bit quantization for models but those layers aren’t supported for non-cuda backends.

Any recommendations on how this could be done? Thanks!

Topic		Replies	Views
Training On Mac M3 Max.. blazing fast but 🤗Transformers	3	8011	December 24, 2023
Loading Llama 2 with quantization on M1 MacBooks Models	2	5388	December 15, 2023
Model Shards Checkpoint GeForce 4070 TI 12GB Models	0	174	February 20, 2024
Recommended hardware for running LLMs locally Beginners	2	33037	December 18, 2023
Bitsandbytes quantization and QLORA fine-tuning 🤗Transformers	1	271	November 5, 2024

Quantizing a model on M1 Mac for qlora

Related topics