Where can I find the full list of parameters for the Accelerate yaml config?

alckasoc · June 5, 2025, 2:03am

Is there a full list of parameters for the yaml config file? I can’t find it. There’s one on the CLI: The Command Line. What are the differences between:

the plugin
specifying the config via a yaml
via flags

Are they just different ways of doing the same thing?

Thanks!

alckasoc · June 5, 2025, 2:14am

quick question:

How do you define multiple layers to wrap?

Is it like this?

fsdp_config:
     fsdp_transformer_layer_cls_to_wrap: 
         - LlamaDecoderLayer
         - Phi3DecoderLayer

I’m getting an issue where

[rank1]: Exception: Could not find the transformer layer class to wrap in the model.

for the “Phi3DecoderLayer”. I run accelerate.prepare on 2 models: 1 is using llama and 1 is phi3.

John6666 · June 5, 2025, 2:40am

Is there a full list of parameters for the yaml config file?

Last resort.

github.com/huggingface/accelerate

src/accelerate/commands/launch.py

main


      
              )
          else:
              parser = CustomArgumentParser(
                  "Accelerate launch command",
                  description=description,
                  add_help=False,
                  allow_abbrev=False,
                  formatter_class=CustomHelpFormatter,
              )
          
          parser.add_argument("-h", "--help", action="help", help="Show this help message and exit.")
          
          parser.add_argument(
              "--config_file",
              default=None,
              help="The config file to use for the default values in the launching script.",
          )
          parser.add_argument(
              "--quiet",
              "-q",
              action="store_true",

John6666 · June 5, 2025, 2:48am

fsdp_config:
     fsdp_transformer_layer_cls_to_wrap: 
         - LlamaDecoderLayer
         - Phi3DecoderLayer

This is correct as YAML, but it will be treated as a list in Python, so it would be better to write it as a string. Maybe like this:

fsdp_config:
     fsdp_transformer_layer_cls_to_wrap: "LlamaDecoderLayer,Phi3DecoderLayer"

github.com/huggingface/accelerate

src/accelerate/commands/launch.py

main


      
              type=str,
              default="true",
              help="FSDP's Reshard After Forward Strategy. (useful only when `use_fsdp` flag is passed). Supports either boolean (FSDP2) or `FULL_SHARD | SHARD_GRAD_OP | NO_RESHARD` (FSDP1).",
          )
          fsdp_args.add_argument(
              "--fsdp_auto_wrap_policy",
              type=str,
              default=None,
              help="FSDP's auto wrap policy. (useful only when `use_fsdp` flag is passed).",
          )
          fsdp_args.add_argument(
              "--fsdp_transformer_layer_cls_to_wrap",
              default=None,
              type=str,
              help="Transformer layer class name (case-sensitive) to wrap ,e.g, `BertLayer`, `GPTJBlock`, `T5Block` .... "
              "(useful only when `use_fsdp` flag is passed).",
          )
          fsdp_args.add_argument(
              "--fsdp_backward_prefetch",
              default=None,
              type=str,

github.com/huggingface/accelerate

What should I pass to fsdp_config.fsdp_transformer_layer_cls_to_wrap argument in the yaml file?

opened 01:06AM - 28 Oct 24 UTC

closed 05:34PM - 05 Nov 24 UTC

ShengYun-Peng

### System Info ```Shell compute_environment: LOCAL_MACHINE … debug: false distributed_type: FSDP downcast_bf16: 'no' fsdp_config: fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP fsdp_backward_prefetch: BACKWARD_PRE fsdp_cpu_ram_efficient_loading: true fsdp_forward_prefetch: false fsdp_offload_params: false fsdp_sharding_strategy: FULL_SHARD fsdp_state_dict_type: FULL_STATE_DICT fsdp_transformer_layer_cls_to_wrap: "BertEmbeddings,BertLayer,BertPooler" fsdp_sync_module_states: true fsdp_use_orig_params: true machine_rank: 0 main_training_function: main mixed_precision: "no" num_machines: 1 num_processes: 5 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false ``` ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported `no_trainer` script in the `examples` folder of the `transformers` repo (such as `run_no_trainer_glue.py`) - [X] My own task or dataset (give details below) ### Reproduction I'm finetuning a text embedding model with the [sentence-transformers library](https://sbert.net/index.html). The model is [gte-base](https://huggingface.co/thenlper/gte-base/tree/main). From the config.json, we can tell gte-base is build on BertModel. While finetuning the model on a toy dataset, I can run it smoothly when ddp is used along with accelerate, as this model fits into one GPU. When I try to train it with FSDP, also with accelerate, different types of errors pop up. After several hours' exploration, I have narrowed the issue to this "fsdp_transformer_layer_cls_to_wrap" argument. Here's what I have done: 1. Does not specify anything for this argument: ``` File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 164, in forward return F.embedding( File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/functional.py", line 2267, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: 'weight' must be 2-D ``` Apparently, the error comes from nn.embedding layer. FSDP flattens the parameters into 1D tensor, but the forward pass cannot restore the original view of the tensor. 2. Following most the tutorials, I set `fsdp_transformer_layer_cls_to_wrap: BertLayer` I got the same error again. 3. So I added the embedding layer to this argument, `fsdp_transformer_layer_cls_to_wrap: "BertEmbeddings,BertLayer"` ``` File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 747, in forward pooled_output = self.dense(first_token_tensor) File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 117, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat2 must be a matrix, got 1-D tensor ``` The error came from the bert pooling layer now, as it's not in the argument, so I added the pooler to the argument. 4. `fsdp_transformer_layer_cls_to_wrap: "BertEmbeddings,BertLayer,BertPooler"` ``` File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/transformers/trainer.py", line 2052, in train return inner_training_loop( File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/transformers/trainer.py", line 2434, in _inner_training_loop _grad_norm = self.accelerator.clip_grad_norm_( File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/accelerate/accelerator.py", line 2372, in clip_grad_norm_ return model.clip_grad_norm_(max_norm, norm_type) File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1111, in clip_grad_norm_ _lazy_init(self, self) File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 139, in _lazy_init _share_state_and_init_handle_attrs(state, root_module) File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 208, in _share_state_and_init_handle_attrs _p_assert( File "/nethome/speng65/miniconda3/envs/meta/lib/python3.10/site-packages/torch/distributed/utils.py", line 166, in _p_assert raise AssertionError(s) AssertionError: Non-root FSDP instance's `_is_root` should not have been set yet or should have been set to `False` ``` This is a new error, and I cannot find valuable resources on resolving this issue. [This](https://github.com/pytorch/pytorch/issues/113496) seems to the closest one, but no solutions are provided at the end. So, my questions are: 1. What should I set for this argument? Should I include all the model modules defined in modeling_xxx.py file when training a transformer model with fsdp? 2. If I launch the code with `accelerate launch --config_file fsdp.yaml python xxx`, do i have to set these fsdp related arguments in the training arguments again, e.g., [fsdp, fsdp_config](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments.fsdp)? 3. How to resolve the above bug? ### Expected behavior See above

Topic		Replies	Views
Accelerate FSDP config prompts 🤗Accelerate	5	4162	September 15, 2023
How to specify FSDP config without launching via Accelerate 🤗Accelerate	2	305	October 19, 2024
FSDP Auto Wrap does not work using `accelerate` in Multi-GPU Setup 🤗Accelerate	0	316	September 6, 2024
How to create the fsdp_config json file for Trainer? Intermediate	4	2907	June 19, 2023
How to start fsdp2 when using trainer? 🤗Transformers	0	102	April 23, 2025

Where can I find the full list of parameters for the Accelerate yaml config?

Related topics