Hi,
I’m writing an integration for RWKV-v2, a fast causal LM with many optimisations over AFT, the Attention-Free Transformer (transformers issue 17230). The testing often hardcodes evaluations of attention-dependent features; for example in transformers/tests/test_configuration_common.py
's ConfigTester
:
def create_and_test_config_common_properties(self):
config = self.config_class(**self.inputs_dict)
common_properties = ["hidden_size", "num_attention_heads", "num_hidden_layers"]
...
For an attention-free transformer it should not be possible to set & retrieve non-zero values for num_attention_heads
.
I am a bit reluctant to remove and override test_configuration_common
without asking: what’s a sensible approach to dealing with these inherited assumptions that aren’t true for a new model architecture? Are there some examples worth looking at?