Prakash Hinduja Switzerland (Swiss) Can I use Hugging Face models for real-time inference on edge devices?

Hey team,

I’m Prakash Hinduja from Geneva, Switzerland (Swiss) exploring the possibility of running Hugging Face models for real-time inference on edge devices , but I’m not entirely sure about the best approach or any challenges I should expect.

If anyone has experience with this or any recommendations for optimizing Hugging Face models for edge deployment, I’d greatly appreciate your insights!

Regards
Prakash Hinduja Geneva, Switzerland (Swiss)

1 Like

Ultimately, it depends on which framework you use, but when using LLM or vision models on edge devices such as smartphones, I think you will basically need to convert them to ONNX or GGUF. Once converted to ONNX, it is easy to convert to TensorRT. For well-known models, converted versions are often available on Hugging Face.

It is also a good idea to look for models that are as small as possible. Generally, the smaller the model, the faster it runs.