Hi,
Thanks for the reply.
I’m not sure I’m able to control the building of llama.cpp as I’m running a python script for the training and after the training, I did the saving and push to HF the built models:
.....
perform_training()
model.save_pretrained("./llama-2-7b-chat_fine_tuned")
tokenizer.save_pretrained("./llama-2-7b-chat_fine_tuned")
model.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)
tokenizer.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)
# save GGUF versions
model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer,)
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, token=hf_token)
model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "f16")
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "f16", token=hf_token)
model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "q4_k_m")
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "q4_k_m", token=hf_token)
The model object was a result of unsloth.FastLanguageModel:
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=2048
)
model = FastLanguageModel.get_peft_model(model)
Could it be that the
python ‘unsloth’ component is still based on older llama.cpp versions?