I am trying to load the pretrained mt5-xl in a GCP VM with 4 vCPUs, 15 GB Memory, and NVIDIA Tesla L4 GPU with 24 GB GPU Memory
Codewise simply
model = AutoModel.from_pretrained(“google/mt5-xl”)
My model loading is failing with RuntimeError: unable to mmap 14970735570 bytes from file
both in CPU only VMs and with GPU VMs, indicating this is a model loading error rather than a GPU OOM memory.
The model loading seems to be hitting the CPU bottleneck before model is actually put on the GPU (GPU memory usage monitored with nvidia-smi
I have come across Accelerate Utilities and Big Modeling so I am working through debugging the device_map
and max_memory
to make this work.
But if there are solutions, or workarounds available, it would be much appreciated
low_cpu_mem_usage
in `from_pretrained` did not make a difference.
With offload_state_dict=True
about 25000 MB got loaded to GPU before getting the same error.