Inference with CPU offload

I want to infer Falcon40b model on GPU with CPU offload.
I use device_map="auto" parameter in AutoModelForCausalLM.from_pretrained() method.
I expect that all maximum space available on GPU will be used and then model will be offloaded to CPU.
But I checked memory consumption and it turns out that only 414Mb out of 40Gb VRAM (1 A100) and almost 100% of RAM are used. So it seems that model is almost completely offloaded to CPU.

How to set GPU to be a primary device and offload to CPU after there is not available space on GPU?