On Demand GPU model hosting?

Hi,

I need to deploy a few small models for demo’s. The demo’s will only be run for a few hours off and on over the course of a month or two. I don’t want to pay for huge amounts of hosting, but I absolutely need to have on-demand full usage of GPU and RAM so they behave correctly during the demos. Any suggestions?

Thanks,
Winton

Serverless GPUs from inferless? Are your demos for running inference?

Hi,

Some examples include:

  • Runpod
  • Lambda Labs
  • Banana.dev
  • Vast.ai

These provide the cheapest prices for both on-demand GPUs and spot instances.

If you want an easier solution to deploy models, then Inference Endpoints might be worth a look. It includes a scale-to-zero option which means that you don’t need to pay if there are no requests.