Hi, I am trying to use FSDP via the HF Trainer.
I am not launching the job with Accelerate because I am using Ray. However, most of the tutorials and documentation (example 1, example 2) assume the job will be launched with Accelerate and therefore demonstrate creating a config with accelerate config --config_file fsdp_config.yaml
. They show a resulting config that looks something like:
fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_cpu_ram_efficient_loading: true
fsdp_forward_prefetch: false
fsdp_offload_params: false
fsdp_sharding_strategy: FULL_SHARD
fsdp_state_dict_type: SHARDED_STATE_DICT
fsdp_sync_module_states: true
fsdp_use_orig_params: false
I am trying to create a comparable config via the Trainer’s arguments: fsdp
and fsdp_config
. I am able to line up most arguments from Accelerate’s fsdp_config.yaml
, but for others I am less sure. I was hoping to get some guidance on the following:
- Does
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
map to the argument"auto_wrap"
, passed viaTrainingArguments.fsdp
? - Does
fsdp_offload_params: true
map to the argument"offload"
, passed viaTrainingArguments.fsdp
? - I don’t see how to specify
fsdp_state_dict_type
at all viaTrainingArguments
… does it default toSHARDED_STATE_DICT
?