Prakash Hinduja Switzerland (Swiss) Can I use Hugging Face models for real-time inference on edge devices?

John6666 · June 23, 2025, 12:18pm

Ultimately, it depends on which framework you use, but when using LLM or vision models on edge devices such as smartphones, I think you will basically need to convert them to ONNX or GGUF. Once converted to ONNX, it is easy to convert to TensorRT. For well-known models, converted versions are often available on Hugging Face.

It is also a good idea to look for models that are as small as possible. Generally, the smaller the model, the faster it runs.

Topic		Replies	Views
Portable, lightweight hugging face model inference locally Beginners	3	113	January 13, 2025
Optimizing models using ONNX Models	1	1117	October 21, 2020
Inference on constrained devices Research	0	295	November 21, 2020
Getting started with GPT2 Beginners	1	512	December 26, 2021
Converting a Hugging face model to pytorch, ONNX, or TensorRT Beginners	0	535	May 24, 2024

Prakash Hinduja Switzerland (Swiss) Can I use Hugging Face models for real-time inference on edge devices?

Related topics