Slow inference using most recent docker image

Hello @ojturner,

From your code

it looks like you are using a custom inference.py script is that correct? could you provide? Have you tested the latency and overhead using the “zero-code” deployment, without providing inference.py.

Could you also share more information about which model/model-architecture/task you are using?