[Proposal] Copy Pasting modeling_bart.py

I am a big huggingface fan!

I want to voice more support for not copy pasting. I think it makes the code much easier to develop and maintain. I also think it makes it much easier to read and understand, and would disagree with the claim that copy-pasting code makes it easier to read.

  1. When we read code we implicitly delegate understanding to other portions of a code base. When we see something like self.calc_mse_loss we understand what’s going on, even if the function lives in another file. The same is true when we see nn.Linear- we don’t need to copy all of the torch code and instead rely on torch to have implemented it correctly. By copying lots of code between models, we have it all in one place, but we make it hard for readers to read and understand the code at a higher-level of abstraction.

  2. It is helpful when reading code to understand exactly where a model differs from similar models, but if all the code is copy-pasted into a model class, it is hard to figure this out. For example, when we see class RobertaEmbeddings(BertEmbeddings) we know that RobertaEmbeddings is largely similar to a BertEmbeddings, and the code that follows in the definition of RobertaEmbeddings is only the ways in which Roberta is different from Bert. Instead, if we copy paste a lot of code from BERT into Roberta, we don’t know how they differ without carefully reading both, and it is difficult to even figure out which classes are more similar to Roberta. I think this is more cognitive burden on someone reading the code, not less.

  3. (really 2a) Requiring that code be identical leads to cases where naming conventions are confusing. For example T5 calls things blocks while other models refer to them as layers. This leads to cases where we need nearly identical methods just because things are named differently (e.g. copy_to_student). As a reader, this further obfuscates the similarities and differences between T5 and other models.

2 Likes