Cannot run large models using API token

hi @AndreaSottana , that is a very large model, it takes a long time to load on our API inference.
Our API inference is suitable for testing and evaluation. If you’re looking for less latency you probably need our dedicated service Inference Endpoints

You can read more about how the hub inference API works here