How to see what part of model are offloaded to CPU?

I am loading llama65 for inference with device_map=“auto”. Is there a way to check what layers are actually offloaded? Also is there a way to specify what part of the model to offload? I am not using deepspeed since I am using an ARM64 machine (GH200) and deepspeed doesn’t support ARM yet.
I am loading model like this

model = AutoModelForCausalLM.from_pretrained("/models/LLAMA-HF/llama-65b-hf/", device_map="auto")

@khayamgondal you can design your own device map as a dictionary, for example:

device_map = {"block1": 0, "block2.linear1": 0, "block2.linear2": 1, "block2.linear3": 1}

where 0 and 1 are device (GPU) identifiers. This way, you can decide which modules of your model are offloaded to which GPU.

Unrelated question: where did you read that DeepSpeed does not support ARM and GH200 chips specifically?