How to modify the imported model architectures

ruipeterpan · March 4, 2022, 2:39pm

Hi, I am trying out ZeRO-style parallelism on large-scale model training using Facebook’s implementation, FairScale, instead of DeepSpeed for the ZeRO implementation. Specifically, I am hoping to apply ZeRO-3 to large transformer models. FairScale’s FSDP module allows users to either (1) directly wrap the transformer model with FSDP for easier usage or (2) wrap the transformer model layer-wise for optimal parallelization and memory savings (example usage).

I have tried out both wrapping approaches to vision models like resnet, because I can either import the model from torchvision and wrap it directly, or do per-layer wrapping by finding coded implementations of resnet architectures and modifying them directly. I have tried out directly wrapping transformers imported from huggingface. My question is that for such transformers, would it be possible to access the coded model architecture to enable per-layer FSDP wrapping?

Thank you in advance!

Topic		Replies	Views
[new model] FSMT has been released + 9 models ported 🤗Transformers	3	1146	September 25, 2020
Transformers fine-tune architecture/code structure Beginners	0	343	September 28, 2021
Suggestions for hugging face transformer models for Code and Formal Languages Intermediate	2	1754	May 3, 2022
How to convert Fairseq model to huggingface transformer model Beginners	1	741	October 31, 2023
How can I integrate Hugging Face Transformers with Red Hat OpenShift? Beginners	2	197	October 30, 2024

How to modify the imported model architectures

Related topics