I want to do Alpaca style training (GitHub - tatsu-lab/stanford_alpaca: Code and documentation to train Stanford's Alpaca models, and generate the data. , stanford_alpaca/train.py at main · tatsu-lab/stanford_alpaca · GitHub) with FSDP, with model sharding. But I am deviating from the alpaca training method in several ways: I want to launch with:
python3 -m torch.distributed.launch --nproc_per_node=4 train-pythia-script.py
(which as far as I can tell, is the best accepted way to use Trainer with FSDP in May 2023).
Also, I am directly specifying arguments in the Trainer init:
training_args = TrainingArguments(
output_dir=“./results”,
fp16=True,
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
gradient_accumulation_steps=8, # alpaca
evaluation_strategy=“no”, # alpaca
save_steps=10_000,
save_total_limit=1, # alpaca
learning_rate=2e-5, # alpaca
weight_decay=0.0, # alpaca
warmup_ratio=0.03, # alpaca
lr_scheduler_type=‘cosine’, # alpaca
logging_steps=1, # alpaca
fsdp=“full_shard auto_wrap”, # alpaca
fsdp_config=‘fsdp_config_pythia.json’
)
I am using the MAIN branch of transformers.
When I tried to include fsdp_transformer_layer_cls_to_wrap argument in Trainer, the code gave me a FutureWarning,
/home/ubuntu/project/transformers/src/transformers/training_args.py:1462: FutureWarning: using --fsdp_transformer_layer_cls_to_wrap
is deprecated. Use fsdp_config instead
and an error:
Exception : raise Exception(“Could not find the transformer layer class to wrap in the model.”)Could not find the transformer layer class to wrap in the model.
so I changed to the argument fsdp_config=‘fsdp_config_pythia.json’
How do I create the fsdp_config json file?