Writing tests for attention-free transformers

leondz · June 2, 2022, 11:52am

Hi,

I’m writing an integration for RWKV-v2, a fast causal LM with many optimisations over AFT, the Attention-Free Transformer (transformers issue 17230). The testing often hardcodes evaluations of attention-dependent features; for example in transformers/tests/test_configuration_common.py 's ConfigTester:

    def create_and_test_config_common_properties(self):
        config = self.config_class(**self.inputs_dict)
        common_properties = ["hidden_size", "num_attention_heads", "num_hidden_layers"]
...

For an attention-free transformer it should not be possible to set & retrieve non-zero values for num_attention_heads.

I am a bit reluctant to remove and override test_configuration_common without asking: what’s a sensible approach to dealing with these inherited assumptions that aren’t true for a new model architecture? Are there some examples worth looking at?

Sanderbaduk · June 7, 2022, 7:10am

Don’t other tests use inheritance of ConfigTester for this? e.g.

class PoolFormerConfigTester(ConfigTester):
    def create_and_test_config_common_properties(self):
        config = self.config_class(**self.inputs_dict)
        self.parent.assertTrue(hasattr(config, "hidden_sizes"))
        self.parent.assertTrue(hasattr(config, "num_encoder_blocks"))

Topic		Replies	Views
Code from HF tutorial on the customization of transformer components is not working as intended 🤗Transformers	4	27	April 18, 2025
Reproduce attention is all you need Beginners	0	480	June 25, 2022
Adding cross-attention to custom models 🤗Transformers	2	3522	October 21, 2022
Are there any plans for replacing attention in transformers? 🤗Transformers	3	1010	July 11, 2024
Attentions not returned from transformers ViT model when using output_attentions=True 🤗Transformers	4	833	July 10, 2024

Writing tests for attention-free transformers

Related topics