On Demand GPU model hosting?


I need to deploy a few small models for demo’s. The demo’s will only be run for a few hours off and on over the course of a month or two. I don’t want to pay for huge amounts of hosting, but I absolutely need to have on-demand full usage of GPU and RAM so they behave correctly during the demos. Any suggestions?


Serverless GPUs from inferless? Are your demos for running inference?


Some examples include:

  • Runpod
  • Lambda Labs
  • Banana.dev
  • Vast.ai

These provide the cheapest prices for both on-demand GPUs and spot instances.

If you want an easier solution to deploy models, then Inference Endpoints might be worth a look. It includes a scale-to-zero option which means that you don’t need to pay if there are no requests.