On Demand GPU model hosting?

wdavies · February 20, 2024, 8:06pm

Hi,

I need to deploy a few small models for demo’s. The demo’s will only be run for a few hours off and on over the course of a month or two. I don’t want to pay for huge amounts of hosting, but I absolutely need to have on-demand full usage of GPU and RAM so they behave correctly during the demos. Any suggestions?

Thanks,
Winton

remg1997 · February 20, 2024, 10:13pm

Serverless GPUs from inferless? Are your demos for running inference?

nielsr · February 22, 2024, 10:34am

Hi,

Some examples include:

Runpod
Lambda Labs
Banana.dev
Vast.ai

These provide the cheapest prices for both on-demand GPUs and spot instances.

If you want an easier solution to deploy models, then Inference Endpoints might be worth a look. It includes a scale-to-zero option which means that you don’t need to pay if there are no requests.

Chirarry · June 2, 2025, 12:08pm

I ran into a similar need and went with a lighter setup using a vps for mac just to manage some preprocessing before sending jobs to a heavier GPU instance. It helped cut costs since I didn’t need the GPU running all the time. You can script it to spin up the GPU server when needed and shut it down after, while keeping the mac vps always running in the background.

Topic		Replies	Views
New: Distributed GPU Platform Research	2	664	November 8, 2023
Model Deploy On-prem Beginners	1	789	March 21, 2024
So now I need a dedicated endpoint to test most models? (only 34k out of 1.6 million supported) Models	0	13	May 8, 2025
What hardware do you use to train your models? Cloud or local? Intermediate	0	779	October 31, 2022
Gen AI on GCP GKE Models	2	518	January 16, 2024

On Demand GPU model hosting?

Related topics