I am currently fine-tuning a LLM (LLaMA) and would like to retrieve the gradients of each weight (parameter) after every gradient update. However, I notice that weights are (auto) wrapped into stuff like “_fsdp_wrapped_module._flat_param” during training. I need to map these wrapped weights to the original LLaMA architecture such as “self_attn.v_proj”. Any code examples?
I guess “summon_full_params()” might be the function that I look for, but I am not sure if that is correct. I also have difficulty using this function. Thanks a lot for any help!