Writing tests for attention-free transformers

Hi,

I’m writing an integration for RWKV-v2, a fast causal LM with many optimisations over AFT, the Attention-Free Transformer (transformers issue 17230). The testing often hardcodes evaluations of attention-dependent features; for example in transformers/tests/test_configuration_common.py 's ConfigTester:

    def create_and_test_config_common_properties(self):
        config = self.config_class(**self.inputs_dict)
        common_properties = ["hidden_size", "num_attention_heads", "num_hidden_layers"]
...

For an attention-free transformer it should not be possible to set & retrieve non-zero values for num_attention_heads.

I am a bit reluctant to remove and override test_configuration_common without asking: what’s a sensible approach to dealing with these inherited assumptions that aren’t true for a new model architecture? Are there some examples worth looking at?

1 Like

Don’t other tests use inheritance of ConfigTester for this? e.g.

class PoolFormerConfigTester(ConfigTester):
    def create_and_test_config_common_properties(self):
        config = self.config_class(**self.inputs_dict)
        self.parent.assertTrue(hasattr(config, "hidden_sizes"))
        self.parent.assertTrue(hasattr(config, "num_encoder_blocks"))