How to specify FSDP config without launching via Accelerate

Hi, I am trying to use FSDP via the HF Trainer.

I am not launching the job with Accelerate because I am using Ray. However, most of the tutorials and documentation (example 1, example 2) assume the job will be launched with Accelerate and therefore demonstrate creating a config with accelerate config --config_file fsdp_config.yaml. They show a resulting config that looks something like:

fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: false
  fsdp_offload_params: false
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: false

I am trying to create a comparable config via the Trainer’s arguments: fsdp and fsdp_config. I am able to line up most arguments from Accelerate’s fsdp_config.yaml, but for others I am less sure. I was hoping to get some guidance on the following:

  • Does fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP map to the argument "auto_wrap", passed via TrainingArguments.fsdp?
  • Does fsdp_offload_params: true map to the argument "offload", passed via TrainingArguments.fsdp?
  • I don’t see how to specify fsdp_state_dict_type at all via TrainingArguments… does it default to SHARDED_STATE_DICT?
1 Like

For anyone else who lands here, you can dig through the __post_init__ method of the HuggingFace training to figure out the mapping of FSDP args between HF <> Accelerate.

1 Like

Not many people know what a Trainer and FSDP are. Or rather, I don’t know because I don’t use it!
People who seem to be Trainers’ maintainers come to the forum from time to time, so you can look for the logs and send them mentions. (@+username)

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.