How is the encoding done for transformers? What encoder is used?

I’ve just found a thread that can be of your interest https://discuss.huggingface.co/t/transformer-architecture-and-theory/14558/2.

On the other hand, it comes to my mind two additional resources: the paper of Attention is all you need and the book Natural Language Processing with Transformers, in which you can find good diagrams that explain the encoder. There is a repository on github of that book. Although all chapters are not released, you can see the images and some code (https://github.com/nlp-with-transformers/notebooks/blob/main/03_transformer-anatomy.ipynb).