CPU Memory Usage with ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags

hanguo · August 31, 2023, 12:10am

I have noticed that when I load the 70B model (specifically LLaMA-2) into the CPU using the ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags, it has almost no effect on the CPU memory usage. However, if I remove either of these flags, it consumes a significant amount of memory. I am curious about the reason behind this behavior. Is there any memory-mapping happening in the background? And if so, when does it trigger? I would really appreciate it if you could help me understand this better.

Thank you for your time!

marcsun13 · August 31, 2023, 10:40pm

Hi @hanguo, I tested and it has something to do with safetensors format + fp16 (torch_dtype=“auto will set torch_dtype=torch.float16) + low_cpu_mem_usage = True. The memory consumption also skyrocket with we use the PyTorch bin format (pytorch bin + fp16 + low_cpu_mem_usage = True). Maybe try in another channel as it is related to the integration of llama2 and safetensors ?

hanguo · August 31, 2023, 11:51pm

Thanks for the response! What’s a good alternative channel for question like this?

marcsun13 · September 1, 2023, 1:58pm

Maybe in the transformers channel as they might have more insights about safetensors and how llama was implemented in transformers !

hanguo · September 1, 2023, 2:44pm

Got it, thanks again!

EDIT: the new post

Topic		Replies	Views
LLaMA-2: CPU Memory Usage with ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags 🤗Transformers	0	3428	September 1, 2023
Double expected memory usage Beginners	1	1441	August 17, 2022
Question about memory usage Beginners	0	1025	May 15, 2023
Loading of a model takes much RAM, passing to CUDA doesn't free RAM 🤗Transformers	0	795	August 8, 2021
On cpu, how to save memory when inferencing? 🤗Transformers	1	651	July 13, 2023

CPU Memory Usage with ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags

Related topics