We are considering copy pasting the contents of modeling_bart.py into modeling_pegasus.py, modeling_mbart.py and modeling_marian.py. and wanted to see what the community thinks.
Currently, here is
marian and mbart are similar. They do nothing and inherit the complete implementation from bart. They are only their own classes so that
AutoTokenizer work properly.
Arguments for copy paste
- if you want to know how a model works you can read it in one file.
- This is how the rest of the repo works and there is no good reason to introduce a stylistic inconsistency here.
- we can delete 6 configuration parameters, that would become always True/always False.
# params needed for mbart self.scale_embedding = scale_embedding # scale factor will be sqrt(d_model) if True self.normalize_embedding = normalize_embedding # True for mbart, False otherwise self.normalize_before = normalize_before # combo of fairseq's encoder_ and decoder_normalize_before self.add_final_layer_norm = add_final_layer_norm # Params introduced for Marian self.add_bias_logits = add_bias_logits self.static_position_embeddings = static_position_embeddings
- we could also delete 6 corresponding
- it might be easier to make one off fixes or modify one model without unintended consequences.
- in the survey, 65% of respondents said they wanted more “Self-contained” code. This is what users want!
Arguments against copy paste
- we have to add and maintain about 4,500 lines of code. Could slow us down on adding new features/models.
- this might make the code harder to maintain. Some bugs/features that require 1 change will require 4 changes.
- the implementations could drift, making it more difficult for the same finetuning code to work with multiple models.
- Integration tests enforce that you can’t have big unintended consequences when you are refactoring, so the “unintended side effects” point is moot.
- It is actually harder to understand what is going on in pegasus/mbart/marian if the code is duplicated. Seeing that pegasus state dict is identical to bart makes reading the code easier. “It is annoying to have to read so many similar implementations and have no easy way to see the delta.”
- Do not repeat yourself/rule of 3 are best practice and there is no good reason for this lib to deviate.
- In the survey, people said they wanted better examples/more models faster, having to maintain more code would distract the team from those objectives.
- Maintaining equivalent test coverage after copy paste would require adding 111 tests!
Would love to hear other people’s opinions as there is some disagreement within the team.
You can vote here and comment below.
I am probably missing alternative solutions, pros and cons. I would love to hear!