Serving AWQ models without a custom container

p-christ · November 7, 2023, 8:11am

Is it possible to serve AWQ models using huggingface’s inference endpoints without using a custom container?

meganariley · November 13, 2023, 1:18pm

Hi @p-christ! Good news–AWQ has been added as a quantization option so you can now use it with Inference Endpoints

p-christ · November 13, 2023, 1:55pm

oh great thanks a LOT!

Do you also know if there’s an easy way of using vLLM with inference endpoints?

Topic		Replies	Views
How to deploy fine-tuned llava model with Huggingface Inference and using vLLM? Inference Endpoints on the Hub	0	211	July 15, 2024
How to use llm model's api? Beginners	2	2693	November 14, 2024
Guide/Tutorial to write an inference endpoint for custom models Inference Endpoints on the Hub	5	1692	October 19, 2024
Is it possible to have streaming responses from inference endpoints? Inference Endpoints on the Hub	6	2081	July 24, 2023
Requirements for Hosting LLM via Inference Endpoints Inference Endpoints on the Hub	2	36	June 13, 2025