I would like to apply movement pruning on GPT2. This model instead of linear layers uses Conv1D layers, could I replace them with the
MaskedLinear layers although they are not linear layers? I appreciate any advice on this. Specially where you return torch.nn.functional.linear after pruning, since GPT-2 uses the version of Conv1D implemented in huggingface, could you kindly tell me if I can still use your method?