How to create the fsdp_config json file for Trainer?

narai · May 31, 2023, 6:56pm

I want to do Alpaca style training (GitHub - tatsu-lab/stanford_alpaca: Code and documentation to train Stanford's Alpaca models, and generate the data. , stanford_alpaca/train.py at main · tatsu-lab/stanford_alpaca · GitHub) with FSDP, with model sharding. But I am deviating from the alpaca training method in several ways: I want to launch with:
python3 -m torch.distributed.launch --nproc_per_node=4 train-pythia-script.py
(which as far as I can tell, is the best accepted way to use Trainer with FSDP in May 2023).

Also, I am directly specifying arguments in the Trainer init:
training_args = TrainingArguments(
output_dir=“./results”,
fp16=True,
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
gradient_accumulation_steps=8, # alpaca
evaluation_strategy=“no”, # alpaca
save_steps=10_000,
save_total_limit=1, # alpaca
learning_rate=2e-5, # alpaca
weight_decay=0.0, # alpaca
warmup_ratio=0.03, # alpaca
lr_scheduler_type=‘cosine’, # alpaca
logging_steps=1, # alpaca
fsdp=“full_shard auto_wrap”, # alpaca
fsdp_config=‘fsdp_config_pythia.json’
)

I am using the MAIN branch of transformers.

When I tried to include fsdp_transformer_layer_cls_to_wrap argument in Trainer, the code gave me a FutureWarning,
/home/ubuntu/project/transformers/src/transformers/training_args.py:1462: FutureWarning: using --fsdp_transformer_layer_cls_to_wrap is deprecated. Use fsdp_config instead

and an error:
Exception : raise Exception(“Could not find the transformer layer class to wrap in the model.”)Could not find the transformer layer class to wrap in the model.

so I changed to the argument fsdp_config=‘fsdp_config_pythia.json’

How do I create the fsdp_config json file?

narai · May 31, 2023, 7:32pm

I think it is working now, after

downgrading to transformers 4.26.1 (which does not use the fsdp_config argument)
removing fsdp_config argument
adding back the fsdp_transformer_layer_cls_to_wrap argument

It is using less memory compared to non-FSDP mode, so I think the model is actually being sharded.

narai · May 31, 2023, 8:02pm

Unfortunately this ‘fix’ worked for my Pythia model test, but I need the newer version of transformers for Llama model support …

muellerzr · June 5, 2023, 1:12am

cc @smangrul

lucasjin · June 19, 2023, 3:03pm

Same here, it would be better to post a link (to doc or something) indicates users how to add fsconfig

FutureWarning: using --fsdp_transformer_layer_cls_to_wrap is deprecated. Use fsdp_config instead

rather than just tell this is deprecated. An intuitive example can help reduce waste users time .

Topic		Replies	Views
How to specify FSDP config without launching via Accelerate 🤗Accelerate	2	292	October 19, 2024
How to start fsdp2 when using trainer? 🤗Transformers	0	85	April 23, 2025
ValueError: Using fsdp only works in distributed training 🤗Transformers	6	2077	July 24, 2024
FSDP with Trainer class: AlgorithmError: ValueError('Cannot flatten integer dtype tensors'), exit code: 1 Intermediate	0	561	June 13, 2024
Save accelerate model 🤗Accelerate	4	754	February 5, 2025

How to create the fsdp_config json file for Trainer?

Related topics