LLaMA-2: CPU Memory Usage with ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags

hanguo · September 1, 2023, 2:43pm

I have noticed that when I load the 70B model (specifically LLaMA-2) into the CPU using the ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags, it has almost no effect on the CPU memory usage. However, if I remove either of these flags, it consumes a significant amount of memory. I am curious about the reason behind this behavior. Is there any memory-mapping happening in the background? And if so, when does it trigger? I would really appreciate it if you could help me understand this better.

This has been discussed in a different channel but was recommended to post in this channel.

@marcsun13 noticed that it might have something to do with safetensors format + fp16 (torch_dtype=“auto will set torch_dtype=torch.float16) + low_cpu_mem_usage = True. The memory consumption also skyrocket with we use the PyTorch bin format (pytorch bin + fp16 + low_cpu_mem_usage = True).

Thank you for your time!

Topic		Replies	Views
CPU Memory Usage with ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags 🤗Accelerate	4	10847	September 1, 2023
Double expected memory usage Beginners	1	1433	August 17, 2022
How to minimize memory consume when loading from pretrained models? 🤗Transformers	0	349	October 9, 2023
CPU generate is only using 15% cpu (LLaMA 13B) 🤗Transformers	0	1322	April 9, 2023
The CPU memory usage becomes very small during model inference 🤗Transformers	0	56	November 30, 2024

LLaMA-2: CPU Memory Usage with ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags

Related topics