Issues when trying to build llama.cpp

jonACE · April 3, 2025, 5:13am

Hi,

Thanks for the reply.

I’m not sure I’m able to control the building of llama.cpp as I’m running a python script for the training and after the training, I did the saving and push to HF the built models:

.....
perform_training()

model.save_pretrained("./llama-2-7b-chat_fine_tuned")
tokenizer.save_pretrained("./llama-2-7b-chat_fine_tuned")

model.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)
tokenizer.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)


# save GGUF versions
model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer,)
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, token=hf_token)

model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "f16")
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "f16", token=hf_token)

model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "q4_k_m")
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "q4_k_m", token=hf_token)

The model object was a result of unsloth.FastLanguageModel:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=2048
)

model = FastLanguageModel.get_peft_model(model)

Could it be that the
python ‘unsloth’ component is still based on older llama.cpp versions?

Topic		Replies	Views
Latest llama.cpp won't build in Spaces Spaces	1	30	August 11, 2025
CUDA convert GUFF to CUDA GUFF Models	6	203	December 18, 2024
How do I run this model Beginners	1	1908	November 7, 2023
LLM architecture Dots1ForCausalLM conversion to GGUF Models	1	103	June 7, 2025
Pip install on Google Collab Beginners	2	123	May 19, 2025

Issues when trying to build llama.cpp

Related topics