Convert Conv1D to nn.Linear

Hi everyone,

I’m working on a research regarding GPT2 and want to test few ideas which apply for linear layers.

What’s the problem? GPT2 model consist of Conv1D layers instead of nn.Linear.

Now I know that Conv1D is said to be just like linear, but transposed, and still I’d love if someone can help me out to convert Huggingface pre-trained GPT2 model (the one with the Conv1D) to equivalent GPT2 model with nn.Linear instead.

Thanks for any coming help!

#transformers #models #research

Hi @IdoAmit198,

Did you figure out a way? Because I also want to apply some function I’ve written on this model but the function currently works only on Linear classes!

Any help is appreciated

This is as simple as it seems. Just set the weights of a linear layer to the transposed weights of a Conv1D layer and set the bias of the linear layer to be the same as the bias of the Conv1D.

It’s as simple as:

test_in = torch.rand((1,1024))
test_c1d = <some pretrained Conv1D layer of size (1024,1024)>
test_lin = torch.nn.Linear(1024,1024)
test_lin.weight = torch.nn.Parameter(test_c1d.weight.T)
test_lin.bias = test_c1d.bias

test_out_c1d = test_c1d(test_in)
test_out_lin = test_lin(test_in)

Show that results are equivalent:

test_out_c1d == test_out_lin