CPU Memory Usage with ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags

Hi @hanguo, I tested and it has something to do with safetensors format + fp16 (torch_dtype=“auto will set torch_dtype=torch.float16) + low_cpu_mem_usage = True. The memory consumption also skyrocket with we use the PyTorch bin format (pytorch bin + fp16 + low_cpu_mem_usage = True). Maybe try in another channel as it is related to the integration of llama2 and safetensors ?