Is it possible to serve AWQ models using huggingface’s inference endpoints without using a custom container?
Hi @p-christ! Good news–AWQ has been added as a quantization option so you can now use it with Inference Endpoints
oh great thanks a LOT!
Do you also know if there’s an easy way of using vLLM with inference endpoints?