In the transformers.LEDConfig
documentation it is stated that:
This is the configuration class to store the configuration of a LEDModel. It is used to instantiate an LED model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the LED [allenai/led-base-16384] architecture.
In transformers.LEDConfig
both the encoder and decoder have 12 layers and that should result in LED base model configuration. On the contrary, in the original paper (https://arxiv.org/pdf/2004.05150.pdf), the LED base model has 6 layers in both encoder and decoder.
Is this a specific design choice or am I missing something?