Which Hugging Face model / task for fast and accurate sentence autocomplete used in search boxes?

Hi everyone,

I’m trying to build a simple autocomplete feature for a web-app search box, where the model/AI task predicatively completes what the user is typing (like Google suggestions). For example, if the user types “how to install freebsd”, the model should generate a few short possible completions.

Please offer some advice and suggestions. I plan on running the model on a dedicated GPU, would like it to be fast and accurate. Langue must be English.

What I’m looking for:

  • Which HF models work best for short, real-time autocomplete?

  • Is text-generation the right way, or should I be using embeddings / another method?

  • Any simple examples or repos showing how to do autocomplete specifically?

Just trying to get some guidance from people who’ve done this before.

Thanks!

1 Like

In terms of speed and latency, using an embedding model is generally preferable for that purpose?