I’m pretty new to NLP field. During my study, I knew that huggingface transformers library has the biggest ecosystem in this field. However, writing a model from scratch to meet the huggingface standard like Seq2SeqLMOutput/GenerationMixin is quite difficult and tiresome. I did a lot of research and created a huggingface-compatible model from scratch in 20 days. Its source code is tiny and super user-friendly, with the following features:
- Encoder-only Transformer
- Decoder-only Transformer
- Encoder-Decoder Transformer with Cross Attention Mask
- Self/Cross Key/Value Cache
- Auto mask handling
GenerationMixin
compatible with drop-ingenerate()
support.
BasicFormer
is created to serve as a backbone Transformer for enthusiasts who loves to customize the model from scratch, it can be also used as a tutorial for beginners who really wants to know the inner working of Transformers and how Huggingface transformers library deal with these structures.
It’s a complete pytorch model that doesn’t contain tokenizer, so the test case in the repo have to use other tokenizers. It also have some shortcomings:
- No Positional Encoding
- No tied weights between the input embedding layer and language model head.
- No sliding window cache support(That’s quite difficult to code I must say).
But I believe these can be customized and overcame. So take my model code and happy coding🤗
P.S. English isn’t my mother language, so forgive me if you find grammar/expression errors.