Need Help in creating ai chatbot for my app

Yeah. But quantization at runtime (on-the-fly quantization) is not that difficult. Of course, it becomes difficult if you are particular about accuracy and speed…
Also, there are many high-quality files available on Hub for the GGUF format, so it would be a good idea to search for them.

# pip install -U bitsandbytes
from transformers import pipeline, BitsAndBytesConfig
import torch

# https://huggingface.co/blog/4bit-transformers-bitsandbytes
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)

pipe = pipeline(
    "image-text-to-text",
    model="google/gemma-3n-e4b-it",
    torch_dtype=torch.bfloat16,
    quantization_config=nf4_config,
)