Hey Philipp and everyone else,
I am looking through the optimum code base and can’t help admiring the design patterns used throughout the code base.
I know some of these ideas come from HF. Still, I was wondering if there are any docs regarding the advantages of the from_pretrained factory design pattern over a simple object construction via init?
Hi @vblagoje, thanks for your admiration of optimum!
Optimum as an extension of transformers enables speedup on training and inference with technics like graph optimization, quantization, distillation, etc. Which also allows users to easily leverage different accelerator hardware.
You can find detailed documentation here: 🤗 Optimum
And an end-to-end example with some performance results in Philipp’s blog: Optimizing Transformers with Hugging Face Optimum
We are working on adding more obvious benchmark results as reference for users, stay tuned!
@Jingya, thanks for your reply. However, I was referring specifically to the use of
from_pretrained design patterns as an object factory rather than
__init()__. Did anyone capture the rationale behind this approach?
I assume you are more talking about the
transformers implementation than the
optimum one. You can definitely still use a model class’s
__init__, but from_pretrained is special. I wrote about it here: python - Why we need the init_weight function in BERT pretrained model in Huggingface Transformers? - Stack Overflow
Right, but in more abstract terms - why is this factory approach better than init? It allows more “room” to prepare various objects for init invocation. As you said, it “finds the correct base model class to initialise”, what are the other advantages on a more abstract level - unrelated to transformers/optimum?
It is not the same as init. All the models still have an init method. If you want to you can still create an empty model from scratch with a config, something like
from_pretrained is a utility function to use pre-trained checkpoints and configs. So it is NOT a replacement of an init function.