Getting CUDA memory error at endpoint - what are my options?

Awesome thank you. Inferentia seems the most promising option right now, but would be very interesting to include g5 in the load testing to see how it compares.