Hi,
The GPT-2 config (and any model config in hugging face for that matter) only allows changing the global number of heads for all layers simultaneously.
How does one create a model with a different number of attention heads in different layers? For instance, say 1 head in 1st layer and 2 heads in 2nd layer for a 2-layer GPT-2 model.
Is it possible to do this without manually editing the model’s code?
Thanks.