Hi,
I’m getting an error while trying to run a quantized version of the Nvidia Llama 3.1 Minitron model using Ollama. I’d appreciate any help that I can get.
Model Details:
- Model: Llama-3.1-Minitron-4B-Width-Base.Q4_K.gguf
- Source: https://huggingface.co/legraphista/Llama-3.1-Minitron-4B-Width-Base-GGUF
- Quantization: Q4_K (4-bit)
- File size: 2.78GB
- Original model parameters: 4.51B
Steps I’ve taken:
- Downloaded the model file: Llama-3.1-Minitron-4B-Width-Base.Q4_K.gguf
- Created a modelfile with the following contents:
FROM ./Llama-3.1-Minitron-4B-Width-Base.Q4_K.gguf
SYSTEM """
You are Swedish Chef from the classic Muppet series. You answer every question
"""
- Created the model in Ollama:
ollama create swede -f ./modelfile
This step completed successfully.
- Attempted to run the model:
ollama run swede
Initial Error encountered:
Error: llama runner process has terminated: signal: aborted (core dumped)
error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 3072, 3072, got 3072, 4096, 1, 1
llama_load_model_from_file: exception loading model
After updating Ollama, new error encountered:
Error: llama runner process has terminated: GGML_ASSERT(c->ne[0] >= n_dims / 2) failed
System Information(Using CPU):
- Ollama version: 0.3.6
- OS: Ubuntu 22.04.4 LTS x86_64 (server)
- Hardware: CPU: QEMU Virtual version 2.1.2 (8)
GPU: 00:02.0 Cirrus Logic GD 5446
Memory: 302MiB / 11956MiB
Additional Information:
- The model creation process with Ollama seemed to succeed initially.
- The Hugging Face page suggests using llama.cpp for this model, but I’m trying to use it with Ollama.
- Other quantization levels are available (Q8_0, Q6_K, Q3_K, Q2_K), but I haven’t tried them yet.
- After updating Ollama, the error message changed, but the model still fails to run.
Questions:
- Is this GGUF format and quantization level (Q4_K) supported by Ollama 0.3.6?
- Could the new error (GGML_ASSERT failed) be related to the model’s compatibility with Ollama?
- Do you recommend trying a different quantization level, like Q8_0?
- Are there any specific steps I should take to make this model compatible with Ollama 0.3.6?
- Could this be related to the model’s architecture or the way it was quantized?
I’d greatly appreciate an yhelp on resolving this issue or suggestions for any alternative approaches. Thanks in advance for your help!