Serving AWQ models without a custom container

Is it possible to serve AWQ models using huggingface’s inference endpoints without using a custom container?

1 Like

Hi @p-christ! Good news–AWQ has been added as a quantization option so you can now use it with Inference Endpoints :hugs:

oh great thanks a LOT!

Do you also know if there’s an easy way of using vLLM with inference endpoints?