I agree with Iz here.
As long as the code which the model inherits is in single file, then it’s readable and personally for me don’t make much sense to copy paste.
And if the models share the code then adding some new functionality (gradient checkpoiting, new heads) to the base model gives me the same functionality for free for other sub-classed models. This actually helps more with experimentation. For example I wanted to try MBart
for seq classification and as it inherits from BART
completely all I had to do was just subclass BartForSequenceClassfication
.
Without this, I would have to copy paste the head, test it again which slows down things.
Another example, I wanted to experiment with Camembert
with EncoderDecoder
and as it inherited from Roberta
which was already in EncoderDecoder
, it was very simple and fast change without requiring much extra code and tests as the tests also came free with base model.
And IMO in some cases such refactoring might even introduce unintended effects, IIRC after refactoring longformer to remove Roberta abstraction a major slowdown was introduced, @beltagy might remember this.