Issues when trying to build llama.cpp

I’m doing some training and wanted to save a .GGUF for OLLAMA but failed with the following errors:

make: Entering directory ‘/home/user/app/llama.cpp’
make: Leaving directory ‘/home/user/app/llama.cpp’
Makefile:2: *** The Makefile build is deprecated. Use the CMake build instead. For more details, see llama.cpp/docs/build.md at master · ggml-org/llama.cpp · GitHub. Stop.
– The C compiler identification is GNU 11.4.0
– The CXX compiler identification is GNU 11.4.0
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /usr/bin/cc - skipped
– Detecting C compile features
– Detecting C compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /usr/bin/c++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found Git: /usr/bin/git (found version “2.34.1”)
– Looking for pthread.h
– Looking for pthread.h - found
– Performing Test CMAKE_HAVE_LIBC_PTHREAD
– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
– Found Threads: TRUE
– Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
– CMAKE_SYSTEM_PROCESSOR: x86_64
– Including CPU backend
– Found OpenMP_C: -fopenmp (found version “4.5”)
– Found OpenMP_CXX: -fopenmp (found version “4.5”)
– Found OpenMP: TRUE (found version “4.5”)
– x86 detected
– Adding CPU backend variant ggml-cpu: -march=native
CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find CURL (missing: CURL_LIBRARY CURL_INCLUDE_DIR)
Call Stack (most recent call first):
/usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.22/Modules/FindCURL.cmake:181 (find_package_handle_standard_args)
common/CMakeLists.txt:88 (find_package)

– Configuring incomplete, errors occurred!
See also “/home/user/app/llama.cpp/build/CMakeFiles/CMakeOutput.log”.
────────────────────────── Traceback (most recent call last) ───────────────────────────
/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/streamlit/runtime/sc
riptrunner/exec_code.py:121 in exec_func_with_error_handling

/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/streamlit/runtime/sc
riptrunner/script_runner.py:640 in code_to_exec

/home/user/app/app.py:107 in

104 tokenizer.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)      
105                                                                                 
106 # save GGUF versions                                                            

❱ 107 model.save_pretrained_gguf(“./llama-2-7b-chat_fine_tuned”, tokenizer,)
108 model.push_to_hub_gguf(“jonACE/llama-2-7b-chat_fine_tuned”, tokenizer)
109
110 model.save_pretrained_gguf(“./llama-2-7b-chat_fine_tuned”, tokenizer, quantiza

/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/unsloth/save.py:1805
in unsloth_save_pretrained_gguf

1802 │   │   │   git_clone = install_llama_cpp_clone_non_blocking()                 
1803 │   │   │   python_install = install_python_non_blocking(["gguf", "protobuf"]  
1804 │   │   │   git_clone.wait()                                                   

❱ 1805 │ │ │ makefile = install_llama_cpp_make_non_blocking()
1806 │ │ │ new_save_directory, old_username = unsloth_save_model(**arguments
1807 │ │ │ python_install.wait()
1808 │ │ pass

/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/unsloth/save.py:785
in install_llama_cpp_make_non_blocking

 782 │   │   n_jobs = max(int(psutil.cpu_count()), 1) # Use less CPUs since 1.5x f  
 783 │   │   check = os.system("cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_  
 784 │   │   if check != 0:                                                         

❱ 785 │ │ │ raise RuntimeError(f"*** Unsloth: Failed compiling llama.cpp usin
786 │ │ pass
787 │ │ # f"cmake --build llama.cpp/build --config Release -j{psutil.cpu_coun
788 │ │ full_command = [
────────────────────────────────────────────────────────────────────────────────────────
RuntimeError: *** Unsloth: Failed compiling llama.cpp using os.system(…) with error
256. Please report this ASAP!
Stopping…

Is this a known issue?

How can I fix this?

1 Like

It seems that the command for building Lllama.cpp has changed. Please refer to the following github description.

Hi,

Thanks for the reply.

I’m not sure I’m able to control the building of llama.cpp as I’m running a python script for the training and after the training, I did the saving and push to HF the built models:

.....
perform_training()

model.save_pretrained("./llama-2-7b-chat_fine_tuned")
tokenizer.save_pretrained("./llama-2-7b-chat_fine_tuned")

model.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)
tokenizer.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)


# save GGUF versions
model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer,)
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, token=hf_token)

model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "f16")
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "f16", token=hf_token)

model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "q4_k_m")
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "q4_k_m", token=hf_token)

The model object was a result of unsloth.FastLanguageModel:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=2048
)

model = FastLanguageModel.get_peft_model(model)

Could it be that the
python ‘unsloth’ component is still based on older llama.cpp versions?

1 Like

There is a new issue on github about 256 errors, so it may be a problem with the latest version of the library. I think you can get around it by saving it with safetensors and converting it to GGUF with the script attached to Llama.cpp, but it would be better if it could be fixed more easily…

I wonder if it can be avoided by installing cmake or something.

pip install cmake

Issue

Workaround

Hi,

I’m using streamlit which runs the python app which contains the training code as well as the conversion and uploading. I’m not sure I can do the ‘pip install cmake’.

I need some more help on this.

Thanks!

1 Like

If you are using official sample code or containers, it will probably be easier to find the problem.
Or, if it is an official container, it feels like a bug if something is missing…

I am using this as a reference:

1 Like

There were similar cases and ways to deal with them, but the method changes depending on the environment you’re using…

If it’s a local, real Python environment, it’s pip as mentioned above, and if it’s Colab, it’s like this.

!pip install cmake

If it’s a container, the way to write and operate it differs depending on the container software…