I need to deploy a few small models for demo’s. The demo’s will only be run for a few hours off and on over the course of a month or two. I don’t want to pay for huge amounts of hosting, but I absolutely need to have on-demand full usage of GPU and RAM so they behave correctly during the demos. Any suggestions?
These provide the cheapest prices for both on-demand GPUs and spot instances.
If you want an easier solution to deploy models, then Inference Endpoints might be worth a look. It includes a scale-to-zero option which means that you don’t need to pay if there are no requests.
I ran into a similar need and went with a lighter setup using a vps for mac just to manage some preprocessing before sending jobs to a heavier GPU instance. It helped cut costs since I didn’t need the GPU running all the time. You can script it to spin up the GPU server when needed and shut it down after, while keeping the mac vps always running in the background.