Prerequisite to run bloom locally?

Talha · July 14, 2022, 6:25am

Can any one tell me, how much ram, gpu ram, and disk space is required to run bloom locally. I have tried to run ir and it has downloaded 180gb of data and still its in download process. so if it finish what are chances to run it locally?
I have rtx 3070

mapama247 · July 18, 2022, 1:19pm

I didn’t have a chance to try it yet, but I’ve read in the official Slack channel that it requires something like 8*80GB A100 or 16*40GB A100 to perform inference locally.

And according to the “how to use” section from the model card you should not only have transformers installed but also the accelerate library.

PS: Check out this quantized version of BLOOM if the original model doesn’t fit in your hardware.

sgugger · July 18, 2022, 1:38pm

You can run it on less than this, as long as you have enough disk space (and plenty of time to wait) as Accelerate automatically offloads weights on the CPU if there is no more space on the GPU, and then on the disk if there is no more CPU RAM.

pai4451 · July 19, 2022, 3:19am

For your reference, I can run on 8*48GB A6000 GPU to perform inference locally, using Accelerate package. Also wondering if there is a way to distribute model layers on two machines.

arteagac · August 8, 2022, 10:00pm

In case it helps, I wrote a blog post that shows how to run BLOOM (176B largest version) on a desktop computer, even if you don’t have a GPU. In my computer (i5 11gen, 16GB RAM, 1TB SSD Samsung 980 pro), the generation takes 3 minutes per token using only the CPU, which is a little slow but manageable. See the blog post link below.

cerisara · September 5, 2022, 8:55am

That’s really nice accelerate is enabling that!
But I wonder how compatible it is with slurm ?
I tried to load bloom on a single node with 1 gpu in the Jean Zay cluster, but model loading was killed because of out-of-memory:
“slurmstepd: error: Detected 1 oom-kill event(s) in StepId=992739.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.”

I was expecting accelerate to offload the model weights on disk when there’s not enough RAM, but slurm’s OOM detector killed it before!

Any idea about out to make accelerate work with slurm?
Thank you!

cerisara · September 5, 2022, 1:15pm

Sorry, my bad: that was because in this shared cluster, accelerate is looking at the full RAM available but Jean Zay environment is killing when one process on a shared node reaches some limit (I dont know exactly how much). So I had to set a limit to accelerate with:

max_memory_mapping = {"cpu": "24GB", 0:"14GB"}
mod = AutoModelForCausalLM.from_pretrained("/pathto/model", low_cpu_mem_usage=True, device_map="auto", offload_folder=offload_dir, max_memory=max_memory_mapping)

Then it works (although very slowly obviously…)

Recurvehunt · September 12, 2022, 12:26am

If I add enough ram to my system to load the whole 330 gig on ram how fast would a token take to generate do you think? A couple seconds, or will I just be completely cpu bound at that point? I’d be building a system from scratch primary for this purpose. But more then one higher end rtx card is out of my budget

arteagac · September 12, 2022, 9:53pm

Having enough RAM to hold the entire model would reduce the execution time; however, you would still be cpu bounded. I did a quick test and, once a BLOOM block is in RAM, my CPU (i5 11gen) takes in average 0.45 sec to run a forward pass on a single bloom block. Therefore, assuming the 70 blocks are already in RAM, you could expect around 70*0.45 sec = 31.5 sec per token.

Topic		Replies	Views
Why can't the bloom model be run (really slowly) on consumer hardware? Models	2	564	July 26, 2022
CUDA Memory Error While Trying to Run Bloom Locally Beginners	2	994	January 10, 2023
Inference with BLOOMZ on CPU Models	0	298	February 22, 2023
My QUESTION is how run a very big model like bloom on a cluster of machines? Research	0	290	May 26, 2023
BLOOM models don't run on my GPU using Transformers 🤗Transformers	1	1667	September 18, 2022

Prerequisite to run bloom locally?

Related topics