Would it be possible to run a public facing huggingface space to to run large models on cpu? Possible to have a quick look at whether the model will run before people download.
The vram for them can get pretty high, but when paired with additional memory, it still runs, which would be good enough as a test.
running with RAM is only the speed of the RAM, but consider the half of the model is loaded to RAM, and half to vram. If vram is instantaneous then RAM is now twice as fast as the model running in only RAM.
It would be a helpful addition for loading 100GB models, for testing.