Prerequisite to run bloom locally?

Can any one tell me, how much ram, gpu ram, and disk space is required to run bloom locally. I have tried to run ir and it has downloaded 180gb of data and still its in download process. so if it finish what are chances to run it locally?
I have rtx 3070

I didn’t have a chance to try it yet, but I’ve read in the official Slack channel that it requires something like 8*80GB A100 or 16*40GB A100 to perform inference locally.

And according to the “how to use” section from the model card you should not only have transformers installed but also the accelerate library.

PS: Check out this quantized version of BLOOM if the original model doesn’t fit in your hardware.

You can run it on less than this, as long as you have enough disk space (and plenty of time to wait) as Accelerate automatically offloads weights on the CPU if there is no more space on the GPU, and then on the disk if there is no more CPU RAM.

For your reference, I can run on 8*48GB A6000 GPU to perform inference locally, using Accelerate package. Also wondering if there is a way to distribute model layers on two machines.

In case it helps, I wrote a blog post that shows how to run BLOOM (176B largest version) on a desktop computer, even if you don’t have a GPU. In my computer (i5 11gen, 16GB RAM, 1TB SSD Samsung 980 pro), the generation takes 3 minutes per token using only the CPU, which is a little slow but manageable. See the blog post link below.