Hi @tueboesen,
Yes, it will work. It can give you a very close results compared to MSA methods, sometimes even better results. If you combine it with MSA, it will even give you a better results compared to MSA methods alone.
We have trained (Transformer XL, XLNet, Bert, Albert, Electra and T5) for Uniref100 and BFD dataset. I would recommend to simply use on of these models, because it requires tremendous amount of computing power to reach good results.
You can find them here:
You can find more details on our paper:
Facebook also trained Roberta using Unrief50 dataset:
Unfortunately, we don’t have a notebook for training from scratch, but you can find more details to replicate our results here:
@patrickvonplaten :
You meant :
Not :
ProtTrans: Provides the SOT pre-trained models for protein sequences.
CodeTrans: Provides the SOTpre-trained models for computer source code.