Issues when trying to build llama.cpp

jonACE · April 2, 2025, 7:37pm

I’m doing some training and wanted to save a .GGUF for OLLAMA but failed with the following errors:

make: Entering directory ‘/home/user/app/llama.cpp’
make: Leaving directory ‘/home/user/app/llama.cpp’
Makefile:2: *** The Makefile build is deprecated. Use the CMake build instead. For more details, see llama.cpp/docs/build.md at master · ggml-org/llama.cpp · GitHub. Stop.
– The C compiler identification is GNU 11.4.0
– The CXX compiler identification is GNU 11.4.0
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /usr/bin/cc - skipped
– Detecting C compile features
– Detecting C compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /usr/bin/c++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found Git: /usr/bin/git (found version “2.34.1”)
– Looking for pthread.h
– Looking for pthread.h - found
– Performing Test CMAKE_HAVE_LIBC_PTHREAD
– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
– Found Threads: TRUE
– Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
– CMAKE_SYSTEM_PROCESSOR: x86_64
– Including CPU backend
– Found OpenMP_C: -fopenmp (found version “4.5”)
– Found OpenMP_CXX: -fopenmp (found version “4.5”)
– Found OpenMP: TRUE (found version “4.5”)
– x86 detected
– Adding CPU backend variant ggml-cpu: -march=native
CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find CURL (missing: CURL_LIBRARY CURL_INCLUDE_DIR)
Call Stack (most recent call first):
/usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.22/Modules/FindCURL.cmake:181 (find_package_handle_standard_args)
common/CMakeLists.txt:88 (find_package)

– Configuring incomplete, errors occurred!
See also “/home/user/app/llama.cpp/build/CMakeFiles/CMakeOutput.log”.
────────────────────────── Traceback (most recent call last) ───────────────────────────
/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/streamlit/runtime/sc
riptrunner/exec_code.py:121 in exec_func_with_error_handling

/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/streamlit/runtime/sc
riptrunner/script_runner.py:640 in code_to_exec

/home/user/app/app.py:107 in

104 tokenizer.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)      
105                                                                                 
106 # save GGUF versions

❱ 107 model.save_pretrained_gguf(“./llama-2-7b-chat_fine_tuned”, tokenizer,)
108 model.push_to_hub_gguf(“jonACE/llama-2-7b-chat_fine_tuned”, tokenizer)
109
110 model.save_pretrained_gguf(“./llama-2-7b-chat_fine_tuned”, tokenizer, quantiza

/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/unsloth/save.py:1805
in unsloth_save_pretrained_gguf

1802 │   │   │   git_clone = install_llama_cpp_clone_non_blocking()                 
1803 │   │   │   python_install = install_python_non_blocking(["gguf", "protobuf"]  
1804 │   │   │   git_clone.wait()

❱ 1805 │ │ │ makefile = install_llama_cpp_make_non_blocking()
1806 │ │ │ new_save_directory, old_username = unsloth_save_model(**arguments
1807 │ │ │ python_install.wait()
1808 │ │ pass

/home/user/.pyenv/versions/3.10.16/lib/python3.10/site-packages/unsloth/save.py:785
in install_llama_cpp_make_non_blocking

 782 │   │   n_jobs = max(int(psutil.cpu_count()), 1) # Use less CPUs since 1.5x f  
 783 │   │   check = os.system("cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_  
 784 │   │   if check != 0:

❱ 785 │ │ │ raise RuntimeError(f"*** Unsloth: Failed compiling llama.cpp usin
786 │ │ pass
787 │ │ # f"cmake --build llama.cpp/build --config Release -j{psutil.cpu_coun
788 │ │ full_command = [
────────────────────────────────────────────────────────────────────────────────────────
RuntimeError: *** Unsloth: Failed compiling llama.cpp using os.system(…) with error
256. Please report this ASAP!
Stopping…

Is this a known issue?

How can I fix this?

John6666 · April 3, 2025, 1:59am

It seems that the command for building Lllama.cpp has changed. Please refer to the following github description.

github.com/ggml-org/llama.cpp

docs/build.md

master

# Build llama.cpp locally

**To get the Code:**

```bash
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
```

The following sections describe how to build with different backends and options.

## CPU Build

Build llama.cpp using `CMake`:

```bash
cmake -B build
cmake --build build --config Release
```

This file has been truncated. show original

jonACE · April 3, 2025, 5:13am

Hi,

Thanks for the reply.

I’m not sure I’m able to control the building of llama.cpp as I’m running a python script for the training and after the training, I did the saving and push to HF the built models:

.....
perform_training()

model.save_pretrained("./llama-2-7b-chat_fine_tuned")
tokenizer.save_pretrained("./llama-2-7b-chat_fine_tuned")

model.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)
tokenizer.push_to_hub("jonACE/llama-2-7b-chat_fine_tuned", token=hf_token)


# save GGUF versions
model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer,)
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, token=hf_token)

model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "f16")
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "f16", token=hf_token)

model.save_pretrained_gguf("./llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "q4_k_m")
model.push_to_hub_gguf("jonACE/llama-2-7b-chat_fine_tuned", tokenizer, quantization_method = "q4_k_m", token=hf_token)

The model object was a result of unsloth.FastLanguageModel:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=2048
)

model = FastLanguageModel.get_peft_model(model)

Could it be that the
python ‘unsloth’ component is still based on older llama.cpp versions?

John6666 · April 3, 2025, 5:28am

There is a new issue on github about 256 errors, so it may be a problem with the latest version of the library. I think you can get around it by saving it with safetensors and converting it to GGUF with the script attached to Llama.cpp, but it would be better if it could be fixed more easily…

I wonder if it can be avoided by installing cmake or something.

pip install cmake

Issue

github.com/unslothai/unsloth

Unsloth: Failed compiling llama.cpp using os.system(...) with error 256.

opened 07:47AM - 28 Mar 25 UTC

Fabel73

I tried recreating a notebook from the Documentation. I am using Ubuntu Server a…nd got at the end the following error: ``` make: Entering directory '/root/unsloth/Reasoner/llama.cpp' Makefile:2: *** The Makefile build is deprecated. Use the CMake build instead. For more details, see https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md. Stop. make: Leaving directory '/root/unsloth/Reasoner/llama.cpp' -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /bin/git (found version "2.34.1") -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- Including CPU backend -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- x86 detected -- Adding CPU backend variant ggml-cpu: -march=native CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find CURL (missing: CURL_LIBRARY CURL_INCLUDE_DIR) Call Stack (most recent call first): /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE) /usr/share/cmake-3.22/Modules/FindCURL.cmake:181 (find_package_handle_standard_args) common/CMakeLists.txt:88 (find_package) -- Configuring incomplete, errors occurred! See also "/root/unsloth/Reasoner/llama.cpp/build/CMakeFiles/CMakeOutput.log". Unsloth: Saving tokenizer... Done. Done. Traceback (most recent call last): File "/root/unsloth/Reasoner/train.py", line 290, in <module> main() File "/root/unsloth/Reasoner/train.py", line 286, in main model.save_pretrained_gguf("Mistral-Small-Thinker_4", tokenizer, quantization_method="q4_k_m") File "/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py", line 1805, in unsloth_save_pretrained_gguf makefile = install_llama_cpp_make_non_blocking() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py", line 785, in install_llama_cpp_make_non_blocking raise RuntimeError(f"*** Unsloth: Failed compiling llama.cpp using os.system(...) with error {check}. Please report this ASAP!") RuntimeError: *** Unsloth: Failed compiling llama.cpp using os.system(...) with error 256. Please report this ASAP! ```

Workaround

github.com/unslothai/unsloth

Failed at model.save_pretrained_gguf

opened 03:31PM - 16 Apr 24 UTC

closed 07:26AM - 19 Jan 25 UTC

ch-tseng

I use the model: https://huggingface.c ==((====))== Unsloth: Fast Llama \\ /| GPU: NVIDIA GeForce O^O/ \_/ \ Pytorch: 2.2.2+cu121. \ / Bfloat16 = TRUE. "-____-" Free Apache license: Loading checkpoint shards: 100%|██ You set `add_prefix_space`. The Unsloth 2024.4 patched 32 layers Setting `pad_token_id` to `eos_token_id`:2 ['<s>Below is an instruction Unsloth: Merging 4bit and LoRA Unsloth: Will use up to 46.9 100%|███████████ Unsloth: Saving tokenizer... Done. Unsloth: Saving model... This Done. Unsloth: Converting llama model. ==((====))== Unsloth: Conversion \\ /| [0] Installing O^O/ \_/ \ [1] Converting \ / [2] Converting "-____-" In total, you will Unsloth: [0] Installing llama.cpp. Unsloth: [1] Converting model The output location will be ./ch_taide This will take 3 minutes... Loading model file ch_taide_medicine.g Loading model file ch_taide_medicine.g Loading model file ch_taide_medicine.g Loading model file ch_taide_medicine.g params = Params(n_vocab=56064, Loaded vocab file PosixPath('ch_taide_ Vocab info: <LlamaHfVocab Special vocab info: <SpecialVocab Permuting layer 0 Permuting layer 1 Permuting layer 2 Permuting layer 3 Permuting layer 4 Permuting layer 5 Permuting layer 6 Permuting layer 7 Permuting layer 8 Permuting layer 9 Permuting layer 10 Permuting layer 11 Permuting layer 12 Permuting layer 13 Permuting layer 14 Permuting layer 15 Permuting layer 16 Permuting layer 17 Permuting layer 18 Permuting layer 19 Permuting layer 20 Permuting layer 21 Permuting layer 22 Permuting layer 23 Permuting layer 24 Permuting layer 25 Permuting layer 26 Permuting layer 27 Permuting layer 28 Permuting layer 29 Permuting layer 30 Permuting layer 31 model.embed_tokens.weight model.layers.0.input_layernorm.weight model.layers.0.mlp.down_proj.weight model.layers.0.mlp.gate_proj.weight model.layers.0.mlp.up_proj.weight model.layers.0.post_attention_layernorm.weight model.layers.0.self_attn.k_proj.weight model.layers.0.self_attn.o_proj.weight model.layers.0.self_attn.q_proj.weight model.layers.0.self_attn.v_proj.weight model.layers.1.input_layernorm.weight model.layers.1.mlp.down_proj.weight model.layers.1.mlp.gate_proj.weight model.layers.1.mlp.up_proj.weight model.layers.1.post_attention_layernorm.weight model.layers.1.self_attn.k_proj.weight model.layers.1.self_attn.o_proj.weight model.layers.1.self_attn.q_proj.weight model.layers.1.self_attn.v_proj.weight model.layers.10.input_layernorm.weight model.layers.10.mlp.down_proj.weight model.layers.10.mlp.gate_proj.weight model.layers.10.mlp.up_proj.weight model.layers.10.post_attention_layernorm.weight model.layers.10.self_attn.k_proj.weight model.layers.10.self_attn.o_proj.weight model.layers.10.self_attn.q_proj.weight model.layers.10.self_attn.v_proj.weight model.layers.11.self_attn.k_proj.weight model.layers.11.self_attn.q_proj.weight model.layers.2.input_layernorm.weight model.layers.2.mlp.down_proj.weight model.layers.2.mlp.gate_proj.weight model.layers.2.mlp.up_proj.weight model.layers.2.post_attention_layernorm.weight model.layers.2.self_attn.k_proj.weight model.layers.2.self_attn.o_proj.weight model.layers.2.self_attn.q_proj.weight model.layers.2.self_attn.v_proj.weight model.layers.3.input_layernorm.weight model.layers.3.mlp.down_proj.weight model.layers.3.mlp.gate_proj.weight model.layers.3.mlp.up_proj.weight model.layers.3.post_attention_layernorm.weight model.layers.3.self_attn.k_proj.weight model.layers.3.self_attn.o_proj.weight model.layers.3.self_attn.q_proj.weight model.layers.3.self_attn.v_proj.weight model.layers.4.input_layernorm.weight model.layers.4.mlp.down_proj.weight model.layers.4.mlp.gate_proj.weight model.layers.4.mlp.up_proj.weight model.layers.4.post_attention_layernorm.weight model.layers.4.self_attn.k_proj.weight model.layers.4.self_attn.o_proj.weight model.layers.4.self_attn.q_proj.weight model.layers.4.self_attn.v_proj.weight model.layers.5.input_layernorm.weight model.layers.5.mlp.down_proj.weight model.layers.5.mlp.gate_proj.weight model.layers.5.mlp.up_proj.weight model.layers.5.post_attention_layernorm.weight model.layers.5.self_attn.k_proj.weight model.layers.5.self_attn.o_proj.weight model.layers.5.self_attn.q_proj.weight model.layers.5.self_attn.v_proj.weight model.layers.6.input_layernorm.weight model.layers.6.mlp.down_proj.weight model.layers.6.mlp.gate_proj.weight model.layers.6.mlp.up_proj.weight model.layers.6.post_attention_layernorm.weight model.layers.6.self_attn.k_proj.weight model.layers.6.self_attn.o_proj.weight model.layers.6.self_attn.q_proj.weight model.layers.6.self_attn.v_proj.weight model.layers.7.input_layernorm.weight model.layers.7.mlp.down_proj.weight model.layers.7.mlp.gate_proj.weight model.layers.7.mlp.up_proj.weight model.layers.7.post_attention_layernorm.weight model.layers.7.self_attn.k_proj.weight model.layers.7.self_attn.o_proj.weight model.layers.7.self_attn.q_proj.weight model.layers.7.self_attn.v_proj.weight model.layers.8.input_layernorm.weight model.layers.8.mlp.down_proj.weight model.layers.8.mlp.gate_proj.weight model.layers.8.mlp.up_proj.weight model.layers.8.post_attention_layernorm.weight model.layers.8.self_attn.k_proj.weight model.layers.8.self_attn.o_proj.weight model.layers.8.self_attn.q_proj.weight model.layers.8.self_attn.v_proj.weight model.layers.9.input_layernorm.weight model.layers.9.mlp.down_proj.weight model.layers.9.mlp.gate_proj.weight model.layers.9.mlp.up_proj.weight model.layers.9.post_attention_layernorm.weight model.layers.9.self_attn.k_proj.weight model.layers.9.self_attn.o_proj.weight model.layers.9.self_attn.q_proj.weight model.layers.9.self_attn.v_proj.weight model.layers.11.input_layernorm.weight model.layers.11.mlp.down_proj.weight model.layers.11.mlp.gate_proj.weight model.layers.11.mlp.up_proj.weight model.layers.11.post_attention_layernorm.weight model.layers.11.self_attn.o_proj.weight model.layers.11.self_attn.v_proj.weight model.layers.12.input_layernorm.weight model.layers.12.mlp.down_proj.weight model.layers.12.mlp.gate_proj.weight model.layers.12.mlp.up_proj.weight model.layers.12.post_attention_layernorm.weight model.layers.12.self_attn.k_proj.weight model.layers.12.self_attn.o_proj.weight model.layers.12.self_attn.q_proj.weight model.layers.12.self_attn.v_proj.weight model.layers.13.input_layernorm.weight model.layers.13.mlp.down_proj.weight model.layers.13.mlp.gate_proj.weight model.layers.13.mlp.up_proj.weight model.layers.13.post_attention_layernorm.weight model.layers.13.self_attn.k_proj.weight model.layers.13.self_attn.o_proj.weight model.layers.13.self_attn.q_proj.weight model.layers.13.self_attn.v_proj.weight model.layers.14.input_layernorm.weight model.layers.14.mlp.down_proj.weight model.layers.14.mlp.gate_proj.weight model.layers.14.mlp.up_proj.weight model.layers.14.post_attention_layernorm.weight model.layers.14.self_attn.k_proj.weight model.layers.14.self_attn.o_proj.weight model.layers.14.self_attn.q_proj.weight model.layers.14.self_attn.v_proj.weight model.layers.15.input_layernorm.weight model.layers.15.mlp.down_proj.weight model.layers.15.mlp.gate_proj.weight model.layers.15.mlp.up_proj.weight model.layers.15.post_attention_layernorm.weight model.layers.15.self_attn.k_proj.weight model.layers.15.self_attn.o_proj.weight model.layers.15.self_attn.q_proj.weight model.layers.15.self_attn.v_proj.weight model.layers.16.input_layernorm.weight model.layers.16.mlp.down_proj.weight model.layers.16.mlp.gate_proj.weight model.layers.16.mlp.up_proj.weight model.layers.16.post_attention_layernorm.weight model.layers.16.self_attn.k_proj.weight model.layers.16.self_attn.o_proj.weight model.layers.16.self_attn.q_proj.weight model.layers.16.self_attn.v_proj.weight model.layers.17.input_layernorm.weight model.layers.17.mlp.down_proj.weight model.layers.17.mlp.gate_proj.weight model.layers.17.mlp.up_proj.weight model.layers.17.post_attention_layernorm.weight model.layers.17.self_attn.k_proj.weight model.layers.17.self_attn.o_proj.weight model.layers.17.self_attn.q_proj.weight model.layers.17.self_attn.v_proj.weight model.layers.18.input_layernorm.weight model.layers.18.mlp.down_proj.weight model.layers.18.mlp.gate_proj.weight model.layers.18.mlp.up_proj.weight model.layers.18.post_attention_layernorm.weight model.layers.18.self_attn.k_proj.weight model.layers.18.self_attn.o_proj.weight model.layers.18.self_attn.q_proj.weight model.layers.18.self_attn.v_proj.weight model.layers.19.input_layernorm.weight model.layers.19.mlp.down_proj.weight model.layers.19.mlp.gate_proj.weight model.layers.19.mlp.up_proj.weight model.layers.19.post_attention_layernorm.weight model.layers.19.self_attn.k_proj.weight model.layers.19.self_attn.o_proj.weight model.layers.19.self_attn.q_proj.weight model.layers.19.self_attn.v_proj.weight model.layers.20.input_layernorm.weight model.layers.20.mlp.down_proj.weight model.layers.20.mlp.gate_proj.weight model.layers.20.mlp.up_proj.weight model.layers.20.post_attention_layernorm.weight model.layers.20.self_attn.k_proj.weight model.layers.20.self_attn.o_proj.weight model.layers.20.self_attn.q_proj.weight model.layers.20.self_attn.v_proj.weight model.layers.21.input_layernorm.weight model.layers.21.mlp.down_proj.weight model.layers.21.mlp.gate_proj.weight model.layers.21.mlp.up_proj.weight model.layers.21.post_attention_layernorm.weight model.layers.21.self_attn.k_proj.weight model.layers.21.self_attn.o_proj.weight model.layers.21.self_attn.q_proj.weight model.layers.21.self_attn.v_proj.weight model.layers.22.input_layernorm.weight model.layers.22.mlp.down_proj.weight model.layers.22.mlp.gate_proj.weight model.layers.22.mlp.up_proj.weight model.layers.22.post_attention_layernorm.weight model.layers.22.self_attn.k_proj.weight model.layers.22.self_attn.o_proj.weight model.layers.22.self_attn.q_proj.weight model.layers.22.self_attn.v_proj.weight model.layers.23.self_attn.k_proj.weight model.layers.23.self_attn.o_proj.weight model.layers.23.self_attn.q_proj.weight model.layers.23.self_attn.v_proj.weight lm_head.weight model.layers.23.input_layernorm.weight model.layers.23.mlp.down_proj.weight model.layers.23.mlp.gate_proj.weight model.layers.23.mlp.up_proj.weight model.layers.23.post_attention_layernorm.weight model.layers.24.input_layernorm.weight model.layers.24.mlp.down_proj.weight model.layers.24.mlp.gate_proj.weight model.layers.24.mlp.up_proj.weight model.layers.24.post_attention_layernorm.weight model.layers.24.self_attn.k_proj.weight model.layers.24.self_attn.o_proj.weight model.layers.24.self_attn.q_proj.weight model.layers.24.self_attn.v_proj.weight model.layers.25.input_layernorm.weight model.layers.25.mlp.down_proj.weight model.layers.25.mlp.gate_proj.weight model.layers.25.mlp.up_proj.weight model.layers.25.post_attention_layernorm.weight model.layers.25.self_attn.k_proj.weight model.layers.25.self_attn.o_proj.weight model.layers.25.self_attn.q_proj.weight model.layers.25.self_attn.v_proj.weight model.layers.26.input_layernorm.weight model.layers.26.mlp.down_proj.weight model.layers.26.mlp.gate_proj.weight model.layers.26.mlp.up_proj.weight model.layers.26.post_attention_layernorm.weight model.layers.26.self_attn.k_proj.weight model.layers.26.self_attn.o_proj.weight model.layers.26.self_attn.q_proj.weight model.layers.26.self_attn.v_proj.weight model.layers.27.input_layernorm.weight model.layers.27.mlp.down_proj.weight model.layers.27.mlp.gate_proj.weight model.layers.27.mlp.up_proj.weight model.layers.27.post_attention_layernorm.weight model.layers.27.self_attn.k_proj.weight model.layers.27.self_attn.o_proj.weight model.layers.27.self_attn.q_proj.weight model.layers.27.self_attn.v_proj.weight model.layers.28.input_layernorm.weight model.layers.28.mlp.down_proj.weight model.layers.28.mlp.gate_proj.weight model.layers.28.mlp.up_proj.weight model.layers.28.post_attention_layernorm.weight model.layers.28.self_attn.k_proj.weight model.layers.28.self_attn.o_proj.weight model.layers.28.self_attn.q_proj.weight model.layers.28.self_attn.v_proj.weight model.layers.29.input_layernorm.weight model.layers.29.mlp.down_proj.weight model.layers.29.mlp.gate_proj.weight model.layers.29.mlp.up_proj.weight model.layers.29.post_attention_layernorm.weight model.layers.29.self_attn.k_proj.weight model.layers.29.self_attn.o_proj.weight model.layers.29.self_attn.q_proj.weight model.layers.29.self_attn.v_proj.weight model.layers.30.input_layernorm.weight model.layers.30.mlp.down_proj.weight model.layers.30.mlp.gate_proj.weight model.layers.30.mlp.up_proj.weight model.layers.30.post_attention_layernorm.weight model.layers.30.self_attn.k_proj.weight model.layers.30.self_attn.o_proj.weight model.layers.30.self_attn.q_proj.weight model.layers.30.self_attn.v_proj.weight model.layers.31.input_layernorm.weight model.layers.31.mlp.down_proj.weight model.layers.31.mlp.gate_proj.weight model.layers.31.mlp.up_proj.weight model.layers.31.post_attention_layernorm.weight model.layers.31.self_attn.k_proj.weight model.layers.31.self_attn.o_proj.weight model.layers.31.self_attn.q_proj.weight model.layers.31.self_attn.v_proj.weight model.norm.weight Writing ch_taide_medicine.gguf-unsloth.F16.gguf, Traceback (most recent call last): File "/GPUData/working/unsloth/conve if True: model.save_pretrained_ggu File "/home/chtseng/envs/LM2/lib/pyt file_location = save_to_gguf(model_type, File "/home/chtseng/envs/LM2/lib/pyt raise RuntimeError( RuntimeError: Unsloth: Quantization You might have to compile llama.cpp You do not need to close this You must run this in the same git clone https://github.com/ggerganov cd llama.cpp && make Once that's done, redo the quantizatio o/taide/TAIDE-LX-7B-Chat to fine-tune, bu…t always got the error. training is OK, but model.save_pretrained_gguf failed. patching release 2024.4 RTX 3090. Max memory: 23.691 GB. Platform = Linux. CUDA = 8.6. CUDA Toolkit = 12.1. Xformers = 0.0.25.post1. FA = False. http://github.com/unslothai/unsloth ██████████████████████████████████████████████████████| 3/3 [00:03<00:00, 1.02s/it] tokenizer needs to be converted from the slow tokenizers with 32 QKV layers, 32 O layers and 32 MLP layers. for open-end generation. that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n\n\n### Input:\n上上禮拜持續出現頭痛、噁心、頭暈的症狀，有時睡一下起來還是沒有緩解，大概都痛在太陽穴上面一點，有時痛在頭腦勺（較少），躺著起來頭暈頻率越來越高（本身有貧血，但近期只要姿勢一轉換就會頭暈眼前接近黑色），容易疲累，想問一下這些症狀有需要到醫院去檢查嗎？\n\n### Response:\n\n您好：\n根據您的描述，您可能有以下幾種可能的原因：\n1. 貧血：貧血是常見的問題，若沒有定期檢查，可能會導致頭暈、頭痛、疲累等症狀。\n2. 內耳問題：內耳有平衡器官，若內耳有問題，可能會導致頭暈、頭痛、噁心等症狀。\n3. 其他疾病：如甲狀腺疾病、心臟疾病、糖尿病、高血壓等，都可能會引起頭暈、頭痛、噁心等症狀。\n建議您前往醫院，讓醫師為您做詳細的檢查，以確定病因，並接受適當的治療。\n祝健康！ </s>'] weights to 16bit... out of 62.57 RAM for saving. ██████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 90.79it/s] might take 5 minutes for Llama-7b... Can use fast conversion = True. from QLoRA to GGUF information llama.cpp will take 3 minutes. HF to GUUF 16bits will take 3 minutes. GGUF 16bits to q4_k_m will take 20 minutes. have to wait around 26 minutes. This will take 3 minutes... at ch_taide_medicine.gguf into f16 GGUF format. _medicine.gguf-unsloth.F16.gguf guf/model-00001-of-00003.safetensors guf/model-00001-of-00003.safetensors guf/model-00002-of-00003.safetensors guf/model-00003-of-00003.safetensors n_embd=4096, n_layer=32, n_ctx=4096, n_ff=11008, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=10000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('ch_taide_medicine.gguf')) medicine.gguf/tokenizer.json'), type 'hfft' with 56020 base tokens and 0 added tokens> with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0, 'pad': 32000}, add special tokens {'bos': True, 'eos': False}> -> token_embd.weight | BF16 | [56064, 4096] -> blk.0.attn_norm.weight | BF16 | [4096] -> blk.0.ffn_down.weight | BF16 | [4096, 11008] -> blk.0.ffn_gate.weight | BF16 | [11008, 4096] -> blk.0.ffn_up.weight | BF16 | [11008, 4096] -> blk.0.ffn_norm.weight | BF16 | [4096] -> blk.0.attn_k.weight | BF16 | [4096, 4096] -> blk.0.attn_output.weight | BF16 | [4096, 4096] -> blk.0.attn_q.weight | BF16 | [4096, 4096] -> blk.0.attn_v.weight | BF16 | [4096, 4096] -> blk.1.attn_norm.weight | BF16 | [4096] -> blk.1.ffn_down.weight | BF16 | [4096, 11008] -> blk.1.ffn_gate.weight | BF16 | [11008, 4096] -> blk.1.ffn_up.weight | BF16 | [11008, 4096] -> blk.1.ffn_norm.weight | BF16 | [4096] -> blk.1.attn_k.weight | BF16 | [4096, 4096] -> blk.1.attn_output.weight | BF16 | [4096, 4096] -> blk.1.attn_q.weight | BF16 | [4096, 4096] -> blk.1.attn_v.weight | BF16 | [4096, 4096] -> blk.10.attn_norm.weight | BF16 | [4096] -> blk.10.ffn_down.weight | BF16 | [4096, 11008] -> blk.10.ffn_gate.weight | BF16 | [11008, 4096] -> blk.10.ffn_up.weight | BF16 | [11008, 4096] -> blk.10.ffn_norm.weight | BF16 | [4096] -> blk.10.attn_k.weight | BF16 | [4096, 4096] -> blk.10.attn_output.weight | BF16 | [4096, 4096] -> blk.10.attn_q.weight | BF16 | [4096, 4096] -> blk.10.attn_v.weight | BF16 | [4096, 4096] -> blk.11.attn_k.weight | BF16 | [4096, 4096] -> blk.11.attn_q.weight | BF16 | [4096, 4096] -> blk.2.attn_norm.weight | BF16 | [4096] -> blk.2.ffn_down.weight | BF16 | [4096, 11008] -> blk.2.ffn_gate.weight | BF16 | [11008, 4096] -> blk.2.ffn_up.weight | BF16 | [11008, 4096] -> blk.2.ffn_norm.weight | BF16 | [4096] -> blk.2.attn_k.weight | BF16 | [4096, 4096] -> blk.2.attn_output.weight | BF16 | [4096, 4096] -> blk.2.attn_q.weight | BF16 | [4096, 4096] -> blk.2.attn_v.weight | BF16 | [4096, 4096] -> blk.3.attn_norm.weight | BF16 | [4096] -> blk.3.ffn_down.weight | BF16 | [4096, 11008] -> blk.3.ffn_gate.weight | BF16 | [11008, 4096] -> blk.3.ffn_up.weight | BF16 | [11008, 4096] -> blk.3.ffn_norm.weight | BF16 | [4096] -> blk.3.attn_k.weight | BF16 | [4096, 4096] -> blk.3.attn_output.weight | BF16 | [4096, 4096] -> blk.3.attn_q.weight | BF16 | [4096, 4096] -> blk.3.attn_v.weight | BF16 | [4096, 4096] -> blk.4.attn_norm.weight | BF16 | [4096] -> blk.4.ffn_down.weight | BF16 | [4096, 11008] -> blk.4.ffn_gate.weight | BF16 | [11008, 4096] -> blk.4.ffn_up.weight | BF16 | [11008, 4096] -> blk.4.ffn_norm.weight | BF16 | [4096] -> blk.4.attn_k.weight | BF16 | [4096, 4096] -> blk.4.attn_output.weight | BF16 | [4096, 4096] -> blk.4.attn_q.weight | BF16 | [4096, 4096] -> blk.4.attn_v.weight | BF16 | [4096, 4096] -> blk.5.attn_norm.weight | BF16 | [4096] -> blk.5.ffn_down.weight | BF16 | [4096, 11008] -> blk.5.ffn_gate.weight | BF16 | [11008, 4096] -> blk.5.ffn_up.weight | BF16 | [11008, 4096] -> blk.5.ffn_norm.weight | BF16 | [4096] -> blk.5.attn_k.weight | BF16 | [4096, 4096] -> blk.5.attn_output.weight | BF16 | [4096, 4096] -> blk.5.attn_q.weight | BF16 | [4096, 4096] -> blk.5.attn_v.weight | BF16 | [4096, 4096] -> blk.6.attn_norm.weight | BF16 | [4096] -> blk.6.ffn_down.weight | BF16 | [4096, 11008] -> blk.6.ffn_gate.weight | BF16 | [11008, 4096] -> blk.6.ffn_up.weight | BF16 | [11008, 4096] -> blk.6.ffn_norm.weight | BF16 | [4096] -> blk.6.attn_k.weight | BF16 | [4096, 4096] -> blk.6.attn_output.weight | BF16 | [4096, 4096] -> blk.6.attn_q.weight | BF16 | [4096, 4096] -> blk.6.attn_v.weight | BF16 | [4096, 4096] -> blk.7.attn_norm.weight | BF16 | [4096] -> blk.7.ffn_down.weight | BF16 | [4096, 11008] -> blk.7.ffn_gate.weight | BF16 | [11008, 4096] -> blk.7.ffn_up.weight | BF16 | [11008, 4096] -> blk.7.ffn_norm.weight | BF16 | [4096] -> blk.7.attn_k.weight | BF16 | [4096, 4096] -> blk.7.attn_output.weight | BF16 | [4096, 4096] -> blk.7.attn_q.weight | BF16 | [4096, 4096] -> blk.7.attn_v.weight | BF16 | [4096, 4096] -> blk.8.attn_norm.weight | BF16 | [4096] -> blk.8.ffn_down.weight | BF16 | [4096, 11008] -> blk.8.ffn_gate.weight | BF16 | [11008, 4096] -> blk.8.ffn_up.weight | BF16 | [11008, 4096] -> blk.8.ffn_norm.weight | BF16 | [4096] -> blk.8.attn_k.weight | BF16 | [4096, 4096] -> blk.8.attn_output.weight | BF16 | [4096, 4096] -> blk.8.attn_q.weight | BF16 | [4096, 4096] -> blk.8.attn_v.weight | BF16 | [4096, 4096] -> blk.9.attn_norm.weight | BF16 | [4096] -> blk.9.ffn_down.weight | BF16 | [4096, 11008] -> blk.9.ffn_gate.weight | BF16 | [11008, 4096] -> blk.9.ffn_up.weight | BF16 | [11008, 4096] -> blk.9.ffn_norm.weight | BF16 | [4096] -> blk.9.attn_k.weight | BF16 | [4096, 4096] -> blk.9.attn_output.weight | BF16 | [4096, 4096] -> blk.9.attn_q.weight | BF16 | [4096, 4096] -> blk.9.attn_v.weight | BF16 | [4096, 4096] -> blk.11.attn_norm.weight | BF16 | [4096] -> blk.11.ffn_down.weight | BF16 | [4096, 11008] -> blk.11.ffn_gate.weight | BF16 | [11008, 4096] -> blk.11.ffn_up.weight | BF16 | [11008, 4096] -> blk.11.ffn_norm.weight | BF16 | [4096] -> blk.11.attn_output.weight | BF16 | [4096, 4096] -> blk.11.attn_v.weight | BF16 | [4096, 4096] -> blk.12.attn_norm.weight | BF16 | [4096] -> blk.12.ffn_down.weight | BF16 | [4096, 11008] -> blk.12.ffn_gate.weight | BF16 | [11008, 4096] -> blk.12.ffn_up.weight | BF16 | [11008, 4096] -> blk.12.ffn_norm.weight | BF16 | [4096] -> blk.12.attn_k.weight | BF16 | [4096, 4096] -> blk.12.attn_output.weight | BF16 | [4096, 4096] -> blk.12.attn_q.weight | BF16 | [4096, 4096] -> blk.12.attn_v.weight | BF16 | [4096, 4096] -> blk.13.attn_norm.weight | BF16 | [4096] -> blk.13.ffn_down.weight | BF16 | [4096, 11008] -> blk.13.ffn_gate.weight | BF16 | [11008, 4096] -> blk.13.ffn_up.weight | BF16 | [11008, 4096] -> blk.13.ffn_norm.weight | BF16 | [4096] -> blk.13.attn_k.weight | BF16 | [4096, 4096] -> blk.13.attn_output.weight | BF16 | [4096, 4096] -> blk.13.attn_q.weight | BF16 | [4096, 4096] -> blk.13.attn_v.weight | BF16 | [4096, 4096] -> blk.14.attn_norm.weight | BF16 | [4096] -> blk.14.ffn_down.weight | BF16 | [4096, 11008] -> blk.14.ffn_gate.weight | BF16 | [11008, 4096] -> blk.14.ffn_up.weight | BF16 | [11008, 4096] -> blk.14.ffn_norm.weight | BF16 | [4096] -> blk.14.attn_k.weight | BF16 | [4096, 4096] -> blk.14.attn_output.weight | BF16 | [4096, 4096] -> blk.14.attn_q.weight | BF16 | [4096, 4096] -> blk.14.attn_v.weight | BF16 | [4096, 4096] -> blk.15.attn_norm.weight | BF16 | [4096] -> blk.15.ffn_down.weight | BF16 | [4096, 11008] -> blk.15.ffn_gate.weight | BF16 | [11008, 4096] -> blk.15.ffn_up.weight | BF16 | [11008, 4096] -> blk.15.ffn_norm.weight | BF16 | [4096] -> blk.15.attn_k.weight | BF16 | [4096, 4096] -> blk.15.attn_output.weight | BF16 | [4096, 4096] -> blk.15.attn_q.weight | BF16 | [4096, 4096] -> blk.15.attn_v.weight | BF16 | [4096, 4096] -> blk.16.attn_norm.weight | BF16 | [4096] -> blk.16.ffn_down.weight | BF16 | [4096, 11008] -> blk.16.ffn_gate.weight | BF16 | [11008, 4096] -> blk.16.ffn_up.weight | BF16 | [11008, 4096] -> blk.16.ffn_norm.weight | BF16 | [4096] -> blk.16.attn_k.weight | BF16 | [4096, 4096] -> blk.16.attn_output.weight | BF16 | [4096, 4096] -> blk.16.attn_q.weight | BF16 | [4096, 4096] -> blk.16.attn_v.weight | BF16 | [4096, 4096] -> blk.17.attn_norm.weight | BF16 | [4096] -> blk.17.ffn_down.weight | BF16 | [4096, 11008] -> blk.17.ffn_gate.weight | BF16 | [11008, 4096] -> blk.17.ffn_up.weight | BF16 | [11008, 4096] -> blk.17.ffn_norm.weight | BF16 | [4096] -> blk.17.attn_k.weight | BF16 | [4096, 4096] -> blk.17.attn_output.weight | BF16 | [4096, 4096] -> blk.17.attn_q.weight | BF16 | [4096, 4096] -> blk.17.attn_v.weight | BF16 | [4096, 4096] -> blk.18.attn_norm.weight | BF16 | [4096] -> blk.18.ffn_down.weight | BF16 | [4096, 11008] -> blk.18.ffn_gate.weight | BF16 | [11008, 4096] -> blk.18.ffn_up.weight | BF16 | [11008, 4096] -> blk.18.ffn_norm.weight | BF16 | [4096] -> blk.18.attn_k.weight | BF16 | [4096, 4096] -> blk.18.attn_output.weight | BF16 | [4096, 4096] -> blk.18.attn_q.weight | BF16 | [4096, 4096] -> blk.18.attn_v.weight | BF16 | [4096, 4096] -> blk.19.attn_norm.weight | BF16 | [4096] -> blk.19.ffn_down.weight | BF16 | [4096, 11008] -> blk.19.ffn_gate.weight | BF16 | [11008, 4096] -> blk.19.ffn_up.weight | BF16 | [11008, 4096] -> blk.19.ffn_norm.weight | BF16 | [4096] -> blk.19.attn_k.weight | BF16 | [4096, 4096] -> blk.19.attn_output.weight | BF16 | [4096, 4096] -> blk.19.attn_q.weight | BF16 | [4096, 4096] -> blk.19.attn_v.weight | BF16 | [4096, 4096] -> blk.20.attn_norm.weight | BF16 | [4096] -> blk.20.ffn_down.weight | BF16 | [4096, 11008] -> blk.20.ffn_gate.weight | BF16 | [11008, 4096] -> blk.20.ffn_up.weight | BF16 | [11008, 4096] -> blk.20.ffn_norm.weight | BF16 | [4096] -> blk.20.attn_k.weight | BF16 | [4096, 4096] -> blk.20.attn_output.weight | BF16 | [4096, 4096] -> blk.20.attn_q.weight | BF16 | [4096, 4096] -> blk.20.attn_v.weight | BF16 | [4096, 4096] -> blk.21.attn_norm.weight | BF16 | [4096] -> blk.21.ffn_down.weight | BF16 | [4096, 11008] -> blk.21.ffn_gate.weight | BF16 | [11008, 4096] -> blk.21.ffn_up.weight | BF16 | [11008, 4096] -> blk.21.ffn_norm.weight | BF16 | [4096] -> blk.21.attn_k.weight | BF16 | [4096, 4096] -> blk.21.attn_output.weight | BF16 | [4096, 4096] -> blk.21.attn_q.weight | BF16 | [4096, 4096] -> blk.21.attn_v.weight | BF16 | [4096, 4096] -> blk.22.attn_norm.weight | BF16 | [4096] -> blk.22.ffn_down.weight | BF16 | [4096, 11008] -> blk.22.ffn_gate.weight | BF16 | [11008, 4096] -> blk.22.ffn_up.weight | BF16 | [11008, 4096] -> blk.22.ffn_norm.weight | BF16 | [4096] -> blk.22.attn_k.weight | BF16 | [4096, 4096] -> blk.22.attn_output.weight | BF16 | [4096, 4096] -> blk.22.attn_q.weight | BF16 | [4096, 4096] -> blk.22.attn_v.weight | BF16 | [4096, 4096] -> blk.23.attn_k.weight | BF16 | [4096, 4096] -> blk.23.attn_output.weight | BF16 | [4096, 4096] -> blk.23.attn_q.weight | BF16 | [4096, 4096] -> blk.23.attn_v.weight | BF16 | [4096, 4096] -> output.weight | BF16 | [56064, 4096] -> blk.23.attn_norm.weight | BF16 | [4096] -> blk.23.ffn_down.weight | BF16 | [4096, 11008] -> blk.23.ffn_gate.weight | BF16 | [11008, 4096] -> blk.23.ffn_up.weight | BF16 | [11008, 4096] -> blk.23.ffn_norm.weight | BF16 | [4096] -> blk.24.attn_norm.weight | BF16 | [4096] -> blk.24.ffn_down.weight | BF16 | [4096, 11008] -> blk.24.ffn_gate.weight | BF16 | [11008, 4096] -> blk.24.ffn_up.weight | BF16 | [11008, 4096] -> blk.24.ffn_norm.weight | BF16 | [4096] -> blk.24.attn_k.weight | BF16 | [4096, 4096] -> blk.24.attn_output.weight | BF16 | [4096, 4096] -> blk.24.attn_q.weight | BF16 | [4096, 4096] -> blk.24.attn_v.weight | BF16 | [4096, 4096] -> blk.25.attn_norm.weight | BF16 | [4096] -> blk.25.ffn_down.weight | BF16 | [4096, 11008] -> blk.25.ffn_gate.weight | BF16 | [11008, 4096] -> blk.25.ffn_up.weight | BF16 | [11008, 4096] -> blk.25.ffn_norm.weight | BF16 | [4096] -> blk.25.attn_k.weight | BF16 | [4096, 4096] -> blk.25.attn_output.weight | BF16 | [4096, 4096] -> blk.25.attn_q.weight | BF16 | [4096, 4096] -> blk.25.attn_v.weight | BF16 | [4096, 4096] -> blk.26.attn_norm.weight | BF16 | [4096] -> blk.26.ffn_down.weight | BF16 | [4096, 11008] -> blk.26.ffn_gate.weight | BF16 | [11008, 4096] -> blk.26.ffn_up.weight | BF16 | [11008, 4096] -> blk.26.ffn_norm.weight | BF16 | [4096] -> blk.26.attn_k.weight | BF16 | [4096, 4096] -> blk.26.attn_output.weight | BF16 | [4096, 4096] -> blk.26.attn_q.weight | BF16 | [4096, 4096] -> blk.26.attn_v.weight | BF16 | [4096, 4096] -> blk.27.attn_norm.weight | BF16 | [4096] -> blk.27.ffn_down.weight | BF16 | [4096, 11008] -> blk.27.ffn_gate.weight | BF16 | [11008, 4096] -> blk.27.ffn_up.weight | BF16 | [11008, 4096] -> blk.27.ffn_norm.weight | BF16 | [4096] -> blk.27.attn_k.weight | BF16 | [4096, 4096] -> blk.27.attn_output.weight | BF16 | [4096, 4096] -> blk.27.attn_q.weight | BF16 | [4096, 4096] -> blk.27.attn_v.weight | BF16 | [4096, 4096] -> blk.28.attn_norm.weight | BF16 | [4096] -> blk.28.ffn_down.weight | BF16 | [4096, 11008] -> blk.28.ffn_gate.weight | BF16 | [11008, 4096] -> blk.28.ffn_up.weight | BF16 | [11008, 4096] -> blk.28.ffn_norm.weight | BF16 | [4096] -> blk.28.attn_k.weight | BF16 | [4096, 4096] -> blk.28.attn_output.weight | BF16 | [4096, 4096] -> blk.28.attn_q.weight | BF16 | [4096, 4096] -> blk.28.attn_v.weight | BF16 | [4096, 4096] -> blk.29.attn_norm.weight | BF16 | [4096] -> blk.29.ffn_down.weight | BF16 | [4096, 11008] -> blk.29.ffn_gate.weight | BF16 | [11008, 4096] -> blk.29.ffn_up.weight | BF16 | [11008, 4096] -> blk.29.ffn_norm.weight | BF16 | [4096] -> blk.29.attn_k.weight | BF16 | [4096, 4096] -> blk.29.attn_output.weight | BF16 | [4096, 4096] -> blk.29.attn_q.weight | BF16 | [4096, 4096] -> blk.29.attn_v.weight | BF16 | [4096, 4096] -> blk.30.attn_norm.weight | BF16 | [4096] -> blk.30.ffn_down.weight | BF16 | [4096, 11008] -> blk.30.ffn_gate.weight | BF16 | [11008, 4096] -> blk.30.ffn_up.weight | BF16 | [11008, 4096] -> blk.30.ffn_norm.weight | BF16 | [4096] -> blk.30.attn_k.weight | BF16 | [4096, 4096] -> blk.30.attn_output.weight | BF16 | [4096, 4096] -> blk.30.attn_q.weight | BF16 | [4096, 4096] -> blk.30.attn_v.weight | BF16 | [4096, 4096] -> blk.31.attn_norm.weight | BF16 | [4096] -> blk.31.ffn_down.weight | BF16 | [4096, 11008] -> blk.31.ffn_gate.weight | BF16 | [11008, 4096] -> blk.31.ffn_up.weight | BF16 | [11008, 4096] -> blk.31.ffn_norm.weight | BF16 | [4096] -> blk.31.attn_k.weight | BF16 | [4096, 4096] -> blk.31.attn_output.weight | BF16 | [4096, 4096] -> blk.31.attn_q.weight | BF16 | [4096, 4096] -> blk.31.attn_v.weight | BF16 | [4096, 4096] -> output_norm.weight | BF16 | [4096] format 1 rt__unsloth_to_gguf.py", line 44, in <module> f("ch_taide_medicine.gguf", tokenizer, quantization_method = "quantized") hon3.10/site-packages/unsloth/save.py", line 1333, in unsloth_save_pretrained_gguf new_save_directory, quantization_method, first_conversion, makefile) hon3.10/site-packages/unsloth/save.py", line 957, in save_to_gguf failed for ./ch_taide_medicine.gguf-unsloth.F16.gguf yourself, then run this again. Python program. Run the following commands in a new terminal: folder as you're saving your model. /llama.cpp clean && LLAMA_CUDA=1 make all -j n.

jonACE · April 3, 2025, 8:51am

Hi,

I’m using streamlit which runs the python app which contains the training code as well as the conversion and uploading. I’m not sure I can do the ‘pip install cmake’.

I need some more help on this.

Thanks!

John6666 · April 3, 2025, 9:00am

If you are using official sample code or containers, it will probably be easier to find the problem.
Or, if it is an official container, it feels like a bug if something is missing…

jonACE · April 3, 2025, 9:04am

I am using this as a reference:

John6666 · April 3, 2025, 9:43am

There were similar cases and ways to deal with them, but the method changes depending on the environment you’re using…

If it’s a local, real Python environment, it’s pip as mentioned above, and if it’s Colab, it’s like this.

!pip install cmake

If it’s a container, the way to write and operate it differs depending on the container software…

Topic		Replies	Views
CUDA convert GUFF to CUDA GUFF Models	6	167	December 18, 2024
How do I run this model Beginners	1	1903	November 7, 2023
LLM architecture Dots1ForCausalLM conversion to GGUF Models	1	73	June 7, 2025
Pip install on Google Collab Beginners	2	102	May 19, 2025
Error loading llama_cpp_binaries Beginners	1	478	May 23, 2025

Issue

Workaround

Related topics