Applying movement-pruning on GPT2

Dara · September 21, 2021, 1:56pm

Hi,
I would like to apply movement pruning on GPT2. This model instead of linear layers uses Conv1D layers, could I replace them with the MaskedLinear layers although they are not linear layers? I appreciate any advice on this. Specially where you return torch.nn.functional.linear after pruning, since GPT-2 uses the version of Conv1D implemented in huggingface, could you kindly tell me if I can still use your method?

Thanks in advance @wolfblue and @VictorSanh

alexflint · July 16, 2023, 2:55pm

I have run into this same issue. The library prints a message that doesn’t really say what to do:

You are loading your model in 8bit or 4bit but no linear modules were found in your model. this can happen for some architectures such as gpt2 that uses Conv1D instead of Linear layers. Please double check your model architecture, or submit an issue on github if you think this is a bug.

From this message it’s not clear whether gpt2-type models are unsupported, or supported via some other approach.

Topic		Replies	Views
Convert Conv1D to nn.Linear 🤗Transformers	2	930	May 12, 2024
Help with Sparse LLM Implementation 🤗Transformers	0	200	April 14, 2024
Some clarification on Conv1D Beginners	1	1705	May 1, 2024
How were the GPT2 pretrained tensorflow models created? 🤗Transformers	1	377	July 20, 2020
GPT-2 Forward w/ and w/o caching of past gives different results Beginners	0	421	May 31, 2022

Applying movement-pruning on GPT2

Related topics