“Don’t repeat yourself” , or DRY , is a well-known principle of software development. The principle originates from “The pragmatic programmer”, one of the most read books on code design. The principle’s simple message makes obvious sense: Don’t rewrite a logic that already exists somewhere else. This ensures the code remains in sync, making it easier to maintain and more robust. Any change to this logical pattern will uniformly affect all of its dependencies.
At first glance, the design of Hugging Face’s Transformers library couldn’t be more contrary to the DRY principle. Code for the attention mechanism is more or less copied over 50 times into different model files. Sometimes code of the whole BERT model is copied into other model files. We often force new model contributions that are identical to existing models - besides a small logical tweak - to copy all of the existing code. Why do we do this? Are we just too lazy or overwhelmed to centralize all logical pieces into one place?
No, we are not lazy - it’s a very conscious decision to not apply the DRY design principle to the Transformers library. Instead, we decided to adopt a different design principle which we like to call the single model file policy. The single model file policy states that all code necessary for the forward pass of a model is in one and only one file - called the model file. If a reader wants to understand how BERT works for inference, she should only have to look into BERT’s
modeling_bert.py file. We usually reject any attempt to abstract identical sub-components of different models into a new centralized place. We don’t want to have a
attention_layer.py that includes all possible attention mechanisms. Again why do we do this?
In short the reasons are:
- 1. Transformers is built for and by the open-source community.
- 2. Our product are models and our customers are users reading or tweaking model code.
- 3. The field of machine learning evolves extremely fast.
- 4. Machine Learning models are static.
Read the full blog post here.
We’re keen to hear what you think! Leave your opinion below
I really like this idea especially as someone that has struggled to understand all the layers of abstraction in other libraries
I think all the DRY use cases can be left to pytorch/tf libraries where the level of abstraction is consistent enough thay it will apply to ML research for years to come (e.g i don’t see autograd going away anytime soon)
The single model file policy is practical and very friendly to new/intermediate users. I like it!
Is a single file with 1000+ lines easy to take a look for beginners the first time, and understand what the architecture is doing?
I really liked the huggingface philosophy by the way. But was getting intimidated by looking a large file with this much lines of code
In 1 file making it (self-)contained in the first place and then I like the top-down as needed.
I wonder if some combination of composition and codegen might achieve some of the accessibility goals while improving readability.
# LMHead = make_lm_head(model, position_embeddings=Fancy)
def __init__(self, config):
self.transformer = MyNewModelTransformer(config)
self.lm_head = nn.Linear(...)
# [Fancy position embedding init code]
def forward(self, all_the_args):
# [Fancy position embedding input prep]
transformer_output = self.model
if labels: ...
Then the same sort of mechanism as “copied from” runs to generate the actual class by composing templates. So when the templates change, a PR gets opened to apply those changes to the generated code.
Major challenges include getting the abstractions right and keeping the generated code readable. Both of those are seriously hard problems, so maybe this idea just can’t fly.
“getting the abstractions right” - In a fast moving field it never happens. At best it is transitory. Unless you abstract something that is really really canonical, one and only. One thing I hate about abstraction is that implementation can silently change under the hood, especially when there are 100 models that subscribes to the abstraction. As a model owner, how I can rest assured that my model is set in stone? And the “beautiful” feeling is really a bubble. While on the other hand in a single model file, it is much more immutable, much less surprise. And the ugly is given upfront.
That’s a fair point! The reason for 1000+ lines of code is usually because of the multiple heads that are supported by the model. The most important code to understand is always the
class ....Model: class
Cool idea! We do think it’s important that all the necessary code is already generated in each file since we don’t expect readers to know about mechanisms like the
# Copied from ...