How to create custom GPT-2 model with different number of attention heads in different layers?

Hi,

The GPT-2 config (and any model config in hugging face for that matter) only allows changing the global number of heads for all layers simultaneously.

How does one create a model with a different number of attention heads in different layers? For instance, say 1 head in 1st layer and 2 heads in 2nd layer for a 2-layer GPT-2 model.

Is it possible to do this without manually editing the model’s code?

Thanks.