How to create the fsdp_config json file for Trainer?

narai · May 31, 2023, 7:32pm

I think it is working now, after

downgrading to transformers 4.26.1 (which does not use the fsdp_config argument)
removing fsdp_config argument
adding back the fsdp_transformer_layer_cls_to_wrap argument

It is using less memory compared to non-FSDP mode, so I think the model is actually being sharded.

Topic		Replies	Views
How to specify FSDP config without launching via Accelerate 🤗Accelerate	2	292	October 19, 2024
How to start fsdp2 when using trainer? 🤗Transformers	0	83	April 23, 2025
ValueError: Using fsdp only works in distributed training 🤗Transformers	6	2076	July 24, 2024
FSDP with Trainer class: AlgorithmError: ValueError('Cannot flatten integer dtype tensors'), exit code: 1 Intermediate	0	561	June 13, 2024
Save accelerate model 🤗Accelerate	4	741	February 5, 2025