Prakash Hinduja Switzerland (Swiss) Can I use Hugging Face models for real-time inference on edge devices?

Ultimately, it depends on which framework you use, but when using LLM or vision models on edge devices such as smartphones, I think you will basically need to convert them to ONNX or GGUF. Once converted to ONNX, it is easy to convert to TensorRT. For well-known models, converted versions are often available on Hugging Face.

It is also a good idea to look for models that are as small as possible. Generally, the smaller the model, the faster it runs.