Building an Efficient NLP API

Delighted to share a work from Carted’s ML team.

We discuss how we built a product categorization API focusing on speed and predictive performance. We share all the vital recipes which allowed us to get to an average latency of 4.63 ms from 61.63 ms!

Read all of it here

I am sharing it because we leveraged many tools from Hugging Face and also the moddel compression recipes (coupled with our own) from the “NLP with Transformers” book.