So now I need a dedicated endpoint to test most models? (only 34k out of 1.6 million supported)

I am very confused with the serverless inference API as it seems that it has been thoroughly revamped recently. So before it was possible to spin up ANY model (maybe wait a bit longer if it is not “warm”) and then do inference on it…

Now, most models are not supported by any inference provider (only 34k out of 1.6 million models are supported), meaning the only way to test non-supported models is to deploy them locally (not feasible in many cases) or to run them on a dedicated endpoint (basically I would need to create a new dedicated endpoint based on model size, as quickly as possible test a few prompts and then destroy the endpoint to avoid any further charges) and rinse and repeat for other models?

Am I missing something here?

1 Like