Using transformers (BERT, RoBERTa) without embedding layer

Hmm that still doesn’t quite do it unless I’m missing something.
This does allow masking of a sequence, but you can only mask 1 amino acid in the sequence, and it doesn’t give the actual probabilities on output, but only the top5 probabilities for that single masked amino acid.