GPTQ and AWQ quantized model doesn't work

lkthomas · February 19, 2024, 8:16am

I am using Space to test out model with Docker template + ChatUI, all quantize model could build but doesn’t have any response. When I type question on the chat box, it keeps loading and never give out any answer, checked log the last line is like this,

INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType(“1-nvidia-a10g”))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.2), repetition_penalty: Some(1.2), frequency_penalty: None, top_k: Some(50), top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: , truncate: Some(1000), watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time=“33.175176192s” validation_time=“1.268962ms” queue_time=“70.212µs” inference_time=“33.173837168s” time_per_token=“6.634767433s” seed=“Some(3004468518659526318)”}: text_generation_router::server: router/src/server.rs:487: Success

Does anyone know why?

Topic		Replies	Views
[RuntimeError] GPU is required to quantize or run quantize model – Qwen1.5-0.5B-Chat in my Space Beginners	3	37	May 23, 2025
Text-generation-inference: "You are using a model of type llama to instantiate a model of type ." Models	5	7453	November 3, 2023
ChatUI Template Spaces	0	304	October 23, 2023
Getting error during create chatUI using docker Spaces	9	1040	October 14, 2024
AWQ quantized version of Llama 3 8B ChatQA Models	0	204	May 3, 2024

GPTQ and AWQ quantized model doesn't work

Related topics