Is it possible to have an inference endpoint return a response that isn't JSON?

Hello, I’m using an inference endpoint for an image segmentation task (i.e. a UNET under the hood) and the output of the pipeline is a segmentation mask, i.e. an image. I’d like to have my endpoint simply return this image as the HTTP request body with some reasonable Content-Type header, but from the docs it’s not clear if anything besides JSON serialization is supported. When I tried returning a bytes object from my handler, for instance, it failed.

I know I could always base64 encode the data and use JSON, but I’d prefer to avoid that additional overhead if it’s possible.

Thanks!

Basically, it should be possible. (For example we routinely do so from Gradio in Spaces)
I’d say the documentation is there, but it’s scattered…

After all, in HF it’s often faster to read the code and try it out than to read the documentation diligently from cover to cover.
I’m not a hacker or anything…

I tried to reverse engineer this a bit since it’s not that well documented, and attempted to use the Accept header to indicate that the response should be returned raw. However, it looks like a raw response isn’t one of the supported auto-serialization modes. The error message was:

('\n'
 '                Accept type "application/octet-stream" not supported.\n'
 '                Supported accept types are:\n'
 '                application/json, text/csv, text/plain, image/png, '
 'image/jpeg, image/jpg, image/tiff, image/bmp, image/gif, image/webp, '
 'image/x-image, audio/x-flac, audio/flac, audio/mpeg, audio/x-mpeg-3, '
 'audio/wave, audio/wav, audio/x-wav, audio/ogg, audio/x-audio, audio/webm, '
 'audio/webm;codecs=opus, audio/AMR, audio/amr, audio/AMR-WB, audio/AMR-WB+, '
 'audio/m4a, audio/x-m4a\n'
 '            ')

From what I can tell from that list, there isn’t a way to indicate that the response should be returned raw as the body. I tried text/plain but that did not work, and it looked like it was looking for some custom serialization logic that didn’t exist.

Well, I guess it’s faster to receive it as image than as JSON.
No, it’s not faster…?
Anyway, for example, Diffusers’ documentation recommended that images be received and saved with PIL.Image.

HF documentation is inevitably auto-generated from comments in the code, so except for the introductory sections, it is faster to read the code if you can read it.
It is even faster to imitate someone else who is doing it well.