Introducing BasicFormer, a backbone network written by me from scratch

automated-perm-acid · September 5, 2025, 7:10pm

repo: GitHub - icy-obelisk/basicformer: A Basic Transformer without positional encoding, 100% huggingface compatible, created as a skeleton transformer for newcomers to integrate new thoughts/algorithms into huggingface ecosystem directly.

I’m pretty new to NLP field. During my study, I knew that huggingface transformers library has the biggest ecosystem in this field. However, writing a model from scratch to meet the huggingface standard like Seq2SeqLMOutput/GenerationMixin is quite difficult and tiresome. I did a lot of research and created a huggingface-compatible model from scratch in 20 days. Its source code is tiny and super user-friendly, with the following features:

Encoder-only Transformer
Decoder-only Transformer
Encoder-Decoder Transformer with Cross Attention Mask
Self/Cross Key/Value Cache
Auto mask handling
GenerationMixin compatible with drop-in generate() support.

BasicFormer is created to serve as a backbone Transformer for enthusiasts who loves to customize the model from scratch, it can be also used as a tutorial for beginners who really wants to know the inner working of Transformers and how Huggingface transformers library deal with these structures.
It’s a complete pytorch model that doesn’t contain tokenizer, so the test case in the repo have to use other tokenizers. It also have some shortcomings:

No Positional Encoding
No tied weights between the input embedding layer and language model head.
No sliding window cache support(That’s quite difficult to code I must say).

But I believe these can be customized and overcame. So take my model code and happy coding🤗

P.S. English isn’t my mother language, so forgive me if you find grammar/expression errors.

Topic		Replies	Views
Base code of custom transformer models not managed by Huggingface 🤗Transformers	0	237	October 6, 2022
How to make pure transformer model Beginners	0	143	May 22, 2024
Implementing a custom Attention Transformer Awesome paper	5	3206	September 6, 2021
Reproduce attention is all you need Beginners	0	488	June 25, 2022
The possibility of using non pre-trained SegFormer 🤗Transformers	0	182	April 13, 2023

Introducing BasicFormer, a backbone network written by me from scratch

Related topics