Why can't the bloom model be run (really slowly) on consumer hardware?

I read in the community for bloom on huggingfaces that to run inference you need around 400GB of GPU. Why can’t you just keep the stuff on an SSD and split the work 400/8 = 50 with an 8GB consumer GPU?

I’m sorry if this is a really dumb question.

As long as you have the disk space, the model can be run on any steup (albeit slowly) with Accelerate. Some users have run it on two GPUs for instance.

With just 8GB of RAM however you will be limited as it’s possible the largest layer of the model does not fit.

1 Like

Thank you!