4bit quantization on inference end point

I need to know is there a way to quantize a model in 4bit and run it in inference end point. currently it is taking 8bit quantization as default. one more thing is can I do my custom quantization in handler.py file?