Hello @ojturner,
From your code
it looks like you are using a custom inference.py script is that correct? could you provide? Have you tested the latency and overhead using the “zero-code” deployment, without providing inference.py.
Could you also share more information about which model/model-architecture/task you are using?