Is there any way to train a basic Transformer (the transformer architecture presented in the original Attention Is All You Need paper). I’m trying to set up a translation baseline on my custom dataset (previously trained models do not exist for the language). If anyone can point out a ready-made implementation of just the basic Transformer it would be very helpful.
I think OpenNMT is the better framework to do something like that. It allows to train a model that you describe easily.
transformers is more directed towards specialized, specific transformer architectures.
Thanks! I used
fairseq a few times. I think OpenNMT is kinda similar, they seem more black box like not full control on the entire pipeline. That’s why I wanted to know if such a thing is possible with transformers because this is native Python and not command-line tools.
And this is TF2-Keras equivalent to @abhishek’s recommendation
Thank you! I implemented mine from scratch in Pytorch and Lightning with the help of bentrevett’s notebook and source code of BART in the transformers library.
Thank you for the resources.