How to create the fsdp_config json file for Trainer?

I want to do Alpaca style training (GitHub - tatsu-lab/stanford_alpaca: Code and documentation to train Stanford's Alpaca models, and generate the data. , stanford_alpaca/train.py at main · tatsu-lab/stanford_alpaca · GitHub) with FSDP, with model sharding. But I am deviating from the alpaca training method in several ways: I want to launch with:
python3 -m torch.distributed.launch --nproc_per_node=4 train-pythia-script.py
(which as far as I can tell, is the best accepted way to use Trainer with FSDP in May 2023).

Also, I am directly specifying arguments in the Trainer init:
training_args = TrainingArguments(
output_dir=“./results”,
fp16=True,
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
gradient_accumulation_steps=8, # alpaca
evaluation_strategy=“no”, # alpaca
save_steps=10_000,
save_total_limit=1, # alpaca
learning_rate=2e-5, # alpaca
weight_decay=0.0, # alpaca
warmup_ratio=0.03, # alpaca
lr_scheduler_type=‘cosine’, # alpaca
logging_steps=1, # alpaca
fsdp=“full_shard auto_wrap”, # alpaca
fsdp_config=‘fsdp_config_pythia.json’
)

I am using the MAIN branch of transformers.

When I tried to include fsdp_transformer_layer_cls_to_wrap argument in Trainer, the code gave me a FutureWarning,
/home/ubuntu/project/transformers/src/transformers/training_args.py:1462: FutureWarning: using --fsdp_transformer_layer_cls_to_wrap is deprecated. Use fsdp_config instead

and an error:
Exception : raise Exception(“Could not find the transformer layer class to wrap in the model.”)Could not find the transformer layer class to wrap in the model.

so I changed to the argument fsdp_config=‘fsdp_config_pythia.json’

How do I create the fsdp_config json file?

I think it is working now, after

  1. downgrading to transformers 4.26.1 (which does not use the fsdp_config argument)
  2. removing fsdp_config argument
  3. adding back the fsdp_transformer_layer_cls_to_wrap argument

It is using less memory compared to non-FSDP mode, so I think the model is actually being sharded.

Unfortunately this ‘fix’ worked for my Pythia model test, but I need the newer version of transformers for Llama model support …

cc @smangrul

Same here, it would be better to post a link (to doc or something) indicates users how to add fsconfig

FutureWarning: using --fsdp_transformer_layer_cls_to_wrap is deprecated. Use fsdp_config instead

rather than just tell this is deprecated. An intuitive example can help reduce waste users time .

1 Like