hi @AndreaSottana , that is a very large model, it takes a long time to load on our API inference.
Our API inference is suitable for testing and evaluation. If you’re looking for less latency you probably need our dedicated service Inference Endpoints
You can read more about how the hub inference API works here