Is there any way to train a basic Transformer (the transformer architecture presented in the original Attention Is All You Need paper). I’m trying to set up a translation baseline on my custom dataset (previously trained models do not exist for the language). If anyone can point out a ready-made implementation of just the basic Transformer it would be very helpful.
I think OpenNMT is the better framework to do something like that. It allows to train a model that you describe easily. transformers is more directed towards specialized, specific transformer architectures.
Thanks! I used fairseq a few times. I think OpenNMT is kinda similar, they seem more black box like not full control on the entire pipeline. That’s why I wanted to know if such a thing is possible with transformers because this is native Python and not command-line tools.