If you want to run inference of quantized LLMs on CPU, it’s recommended to take a look at the llama cpp project: GitHub - ggerganov/llama.cpp: LLM inference in C/C++. This one leverages a new format called GGUF
There’s now also the MLX framework by Apple which allows to run these models on Macbooks: GitHub - ml-explore/mlx: MLX: An array framework for Apple silicon
What you could do is train a model using the Hugging Face tooling (PEFT, TRL, Transformers) and then export your model to the GGUF format: llama.cpp/convert-hf-to-gguf.py at master · ggerganov/llama.cpp · GitHub. You can then run your quantized model on CPU.