Can’t pull Q4 model from IlyaGusev/saiga_llama3_8b_gguf · Hugging Face via “Use this model”->Ollama button.
All Q2, Q4, Q8, f16 exist in repository, I can download files, but “Use this model”->Ollama only works with Q2_K, Q8_0, F16. It worked month ago.
By default the model field is empty, but if I copy I will get: “ollama run hf.co/IlyaGusev/saiga_llama3_8b_gguf: Q4_K_M ”
If I run ollama pull hfco/IlyaGusev/saiga_llama3_8b_gguf:
Q4_K_M output is “The specified tag is not available in the repository. Please use another tag or latest”
If I change tag and run ollama pull hfco/IlyaGusev/saiga_llama3_8b_gguf:Q4_K output is “The specified tag is not a valid quantization scheme. Please use another tag or latest”
Is this HF issue?
1 Like
I’m not sure if this is an HF issue or an Ollama issue…
Either way, it’s an issue.
opened 12:26AM - 12 Jan 25 UTC
bug
## Bug description
The configuration here: https://github.com/huggingface/cha… t-ui/blob/main/docs/source/configuration/models/providers/ollama.md says that the model name should be "mistral". It should be "mistral:latest" instead. Furthermore, the auto-pulling of ollama models originally added here: https://github.com/huggingface/chat-ui/pull/1227 does not work with this model name inconsistency.
## Steps to reproduce
Vanilla chat-ui setup with ollama installed. Use config from above. Download mistral "ollama pull mistral". Chat-ui unable to use ollama/mistral
## Context
### Logs
Here's a snippet from ollama's tags:
```
$> curl http://localhost:11434/api/tags
{"models":[{"name":"mistral:latest","model":"mistral:latest","modified_at":"2025-01-11T16:17:32.785621658-08:00","size":4113301824,"digest":"f974a74358d62a017b37c6f424fcdf2744ca02926c4f952513ddf474b2fa5091","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"7.2B","quantization_level":"Q4_0"}},{"name":"tinyllama:latest","model":"tinyllama:latest","modified_at":"2025-01-11T16:03:16.107607114-08:00","size":637700138,"digest":"2644915ede352ea7bdfaff0bfac0be74c719d5d5202acb63a6fb095b52f394a4","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"1B","quantization_level":"Q4_0"}}]}
```
### Specs
- **OS**: Mac
- **Browser**: Firefox
- **chat-ui commit**: f82af0b
## Notes
If you change the config to:
```
MONGODB_URL=mongodb://localhost:27017/
MODELS=`[
{
"name": "Ollama Mistral",
"chatPromptTemplate": "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}} {{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s> {{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 3072,
"max_new_tokens": 1024,
"stop": ["</s>"]
},
"endpoints": [
{
"type": "ollama",
"url" : "http://127.0.0.1:11434",
"ollamaName" : "mistral:latest"
}
]
}
]`
```
Notice the ollamaName: "mistral:latest" it works.
2 Likes
I’m not sure if this is a bug, but I noticed an inconsistency in the file type labels for these GGUF models:
Model (Q2_K):
Model (Q4_K):
Is this a labeling inconsistency, or could it indicate an issue with the GGUF file format?
The Q2_K file is labeled simply as Q2_K
, while the Q4_K file shows Q4_K_M
.
Shouldn’t both follow the same naming convention (e.g., Qx_K
)?
Could this affect compatibility or is it just a metadata oversight?
1 Like
There are no specific rules for file names…
Either way seems fine.
If it was working fine before (a month ago), then either Ollama or HF was pulling files according to the file name rules, or vice versa.
Hello, I'm wondering what quantization method or what you want to call it has the best output quality. Should you use q8_0, q4_0 or anything in between? I'm asking this question because the q8_0 ve...
1 Like
I don’t know, but I have already successfully used q4 model.
As John6666 said, it seems like HF “Use this model” integration became more “strict” and began to check the name in the model metadata with the name of file\quantization.
2 Likes