How to Query the Progress of Inference on a Custom Endpoint Handler?

Hello Hugging Face community,

I’m currently working with a custom handler for my inference pipeline, and I’m trying to understand how I can query the progress of the inference from an endpoint.

Below is the method I’m currently using:

def __call__(self, data: Any) -> List[List[Dict[str, float]]]:
        data (:obj:):
            includes the input data and the parameters for the inference.
        A :obj:`dict`:. base64 encoded image
    inputs = data.pop("inputs", data)
    # run inference pipeline
    with autocast(device.type):
        image = self.pipe(inputs, guidance_scale=7.5)["sample"][0]  
    # encode image as base 64
    buffered = BytesIO(), format="JPEG")
    img_str = base64.b64encode(buffered.getvalue())

    # postprocess the prediction
    return {"image": img_str.decode()}

While this works to get the result, it doesn’t provide any insights into how far the inference has progressed.

My questions are:

  1. How can I modify the above __call__ method to provide updates or feedback about the inference progress?
  2. How do I subsequently query the endpoint to get this progress information?

Any help, sample code, or pointers would be greatly appreciated!

Thank you in advance!