Hey,
I’d recommend taking a look at this repo: GitHub - agemagician/CodeTrans: Pretrained Language Models for Source code by @agemagician . This repo uses transformer models for protein sequences if I understand it correctly.
Also, taking a look at those models:
might help. Not sure if there is a notebook on doing protein sequence LM, maybe @agemagician has a good pointer by chance ![]()