Introducing BasicFormer, a backbone network written by me from scratch

repo: GitHub - icy-obelisk/basicformer: A Basic Transformer without positional encoding, 100% huggingface compatible, created as a skeleton transformer for newcomers to integrate new thoughts/algorithms into huggingface ecosystem directly.

I’m pretty new to NLP field. During my study, I knew that huggingface transformers library has the biggest ecosystem in this field. However, writing a model from scratch to meet the huggingface standard like Seq2SeqLMOutput/GenerationMixin is quite difficult and tiresome. I did a lot of research and created a huggingface-compatible model from scratch in 20 days. Its source code is tiny and super user-friendly, with the following features:

  • Encoder-only Transformer
  • Decoder-only Transformer
  • Encoder-Decoder Transformer with Cross Attention Mask
  • Self/Cross Key/Value Cache
  • Auto mask handling
  • GenerationMixin compatible with drop-in generate() support.

BasicFormer is created to serve as a backbone Transformer for enthusiasts who loves to customize the model from scratch, it can be also used as a tutorial for beginners who really wants to know the inner working of Transformers and how Huggingface transformers library deal with these structures.
It’s a complete pytorch model that doesn’t contain tokenizer, so the test case in the repo have to use other tokenizers. It also have some shortcomings:

  • No Positional Encoding
  • No tied weights between the input embedding layer and language model head.
  • No sliding window cache support(That’s quite difficult to code I must say).

But I believe these can be customized and overcame. So take my model code and happy coding🤗

P.S. English isn’t my mother language, so forgive me if you find grammar/expression errors.

1 Like