Yeah. But quantization at runtime (on-the-fly quantization) is not that difficult. Of course, it becomes difficult if you are particular about accuracy and speed…
Also, there are many high-quality files available on Hub for the GGUF format, so it would be a good idea to search for them.
# pip install -U bitsandbytes
from transformers import pipeline, BitsAndBytesConfig
import torch
# https://huggingface.co/blog/4bit-transformers-bitsandbytes
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)
pipe = pipeline(
"image-text-to-text",
model="google/gemma-3n-e4b-it",
torch_dtype=torch.bfloat16,
quantization_config=nf4_config,
)