I need to deploy a few small models for demo’s. The demo’s will only be run for a few hours off and on over the course of a month or two. I don’t want to pay for huge amounts of hosting, but I absolutely need to have on-demand full usage of GPU and RAM so they behave correctly during the demos. Any suggestions?
These provide the cheapest prices for both on-demand GPUs and spot instances.
If you want an easier solution to deploy models, then Inference Endpoints might be worth a look. It includes a scale-to-zero option which means that you don’t need to pay if there are no requests.