[Quiestion]How to specify 'model_type' of 'Qwen/Qwen3-VL-8B-Instruct-GGUF'?

Hi there,

I’m trying to use ‘Qwen3VL-8B-Instruct-Q8_0.gguf’ via transformers.

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Qwen/Qwen3-VL-8B-Instruct-GGUF", dtype="auto")

When I tried to load model with above, error message appears.
‘ValueError: Unrecognized model in Qwen/Qwen3-VL-8B-Instruct-GGUF. Should have a model_type key in its config.json, or contain one of the following strings in its name: aimv2,…’

Now the question is how to specify ‘model_type’ in python script ?

Thanks in advance.

1 Like

I think you’re mismatching the model format you’re loading with the software. Basically, just think of GGUF as for Llama.cpp or Ollama, and everything else as for Transformers.

While it is indeed possible to load GGUF with Transformers, this is only for extremely specialized use cases and offers no particular benefit for general use…

Use non-GGUF model weights with Transformers, and if VRAM is insufficient, employ quantization like bitsandbytes or TorchAO.


What model_type is (and why you’re seeing this error)

In Transformers, AutoModel.from_pretrained(...) first loads a config.json, then uses config["model_type"] to choose which model class to instantiate (e.g., LlamaForCausalLM, Qwen2ForCausalLM, etc.). If there is no config (or no model_type), you get the exact error you saw. (Hugging Face)

So, you generally do not “specify model_type in Python code”. You provide it via the model’s config.json.


Why Qwen/Qwen3-VL-8B-Instruct-GGUF fails with AutoModel

That repository is a GGUF distribution, not a standard Transformers checkpoint repo.

The model card says it contains GGUF weights split into two components:

  • Language model (LLM) GGUF
  • Vision encoder (mmproj) GGUF

…and it’s intended for llama.cpp / other GGUF-based tools. (Hugging Face)

Because of that layout, Transformers’ normal AutoModel path (expecting a Transformers-style config + weights) doesn’t apply.


Even if you “add a config.json”, it still won’t do what you want

Transformers does support loading some GGUF models, but:

  1. Transformers GGUF loading is for training/finetuning workflows: it explicitly says the GGUF checkpoint is dequantized to fp32 so it becomes standard PyTorch weights. (Hugging Face)
  2. It only works for a specific set of architectures (examples listed include Llama, Mistral, Qwen2, etc.), and you must pass gguf_file=... to from_pretrained(). (Hugging Face)
  3. Qwen3-VL GGUF commonly hits “architecture … not supported yet” in Transformers/vLLM paths (end-user reports exist for qwen3vl and qwen3vlmoe). (Hugging Face)

Also, Transformers’ GGUF loader expects a single GGUF in the usual text-model sense; Qwen3-VL GGUF needs both LLM + mmproj to function as a multimodal model. (Hugging Face)

So: creating/overriding model_type won’t turn this GGUF repo into a Transformers-loadable VL model.


What the model_type would be for Qwen3-VL (for reference)

In the Transformers-native Qwen3-VL repo (Qwen/Qwen3-VL-8B-Instruct), the config contains:

  • "model_type": "qwen3_vl"
  • "architectures": ["Qwen3VLForConditionalGeneration"] (Hugging Face)

That’s the correct model_type for Qwen3-VL in Transformers.

But again: that’s for the non-GGUF checkpoint repo, not the GGUF repo.


Correct solutions (pick the one that matches your goal)

Option A) Use Qwen3-VL in Transformers (recommended if you want Transformers APIs)

Use the non-GGUF weights repo and the Qwen3-VL class. The model card explicitly recommends installing Transformers from source for Qwen3-VL support and shows the working code. (Hugging Face)

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor

model_id = "Qwen/Qwen3-VL-8B-Instruct"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_id,
    dtype="auto",
    device_map="auto",
)

processor = AutoProcessor.from_pretrained(model_id)

If your reason for GGUF was low VRAM, do that on the Transformers-native checkpoint using Transformers quantization (bitsandbytes/AWQ/GPTQ) + flash_attention_2, not GGUF-in-Transformers. (Qwen’s model card explicitly calls out flash_attention_2 for memory saving.) (Hugging Face)


Option B) Use the GGUF Q8_0 you already have (recommended if you specifically want Q8_0.gguf)

Run it with llama.cpp (or compatible tools), providing both the LLM GGUF and the mmproj GGUF. The GGUF model card provides the intended usage and explains the split. (Hugging Face)

This is the GGUF-native approach that actually preserves GGUF’s practical benefits.


Option C) If you want to learn Transformers GGUF loading (but not for Qwen3-VL)

Transformers GGUF loading is done via gguf_file=..., and only for supported model families. The official docs include a working example with AutoModelForCausalLM + gguf_file. (Hugging Face)

This is useful for supported text GGUF models, but it does not solve Qwen3-VL GGUF today for the reasons above.


Direct answer to your question

  • You cannot “specify model_type in the Python script” in a way that makes AutoModel recognize Qwen/Qwen3-VL-8B-Instruct-GGUF. model_type is read from config.json, and this repo is a GGUF split-package intended for llama.cpp. (Hugging Face)
  • The correct model_type for Qwen3-VL in Transformers is qwen3_vl, but you should use it via the Transformers-native repo (Qwen/Qwen3-VL-8B-Instruct) and the proper model class. (Hugging Face)
2 Likes

Dear john6666,

Thank you for your detailed information. I just expected to play with GGUF without creating new environment.

I’ve already play around with native 4B model for a while and want to know the difference in accuracy between 4B and 8B-GGUF.

Thank you !!:laughing:

1 Like

Please delete this info for stray sheep…

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.