Let me say this first. It’s reckless… the amount of money needed is off by one digit.
Possible compromises would be to use a model with weaker but similar characteristics, or to accept the hopelessly slow speed and load it into RAM or SSD instead of VRAM and infer from there. RAM is cheap now. There is a 128GB limit for consumer-use Windows, but…
Hugging Face’s accelerate library can also run models stored on SSD.
For example, models with characteristics similar to o1 (reasoning-focused models) are as follows.