Way to train a basic Transformer

asvs · November 20, 2020, 10:11am

Is there any way to train a basic Transformer (the transformer architecture presented in the original Attention Is All You Need paper). I’m trying to set up a translation baseline on my custom dataset (previously trained models do not exist for the language). If anyone can point out a ready-made implementation of just the basic Transformer it would be very helpful.

Thanks!

BramVanroy · November 20, 2020, 12:04pm

I think OpenNMT is the better framework to do something like that. It allows to train a model that you describe easily. transformers is more directed towards specialized, specific transformer architectures.

asvs · November 20, 2020, 2:59pm

Thanks! I used fairseq a few times. I think OpenNMT is kinda similar, they seem more black box like not full control on the entire pipeline. That’s why I wanted to know if such a thing is possible with transformers because this is native Python and not command-line tools.

abhishek · November 20, 2020, 10:36pm

Have you seen this: https://pytorch.org/tutorials/beginner/transformer_tutorial.html ?

Jung · November 21, 2020, 3:06am

And this is TF2-Keras equivalent to @abhishek’s recommendation

asvs · November 21, 2020, 6:20am

Thank you! I implemented mine from scratch in Pytorch and Lightning with the help of bentrevett’s notebook and source code of BART in the transformers library.

asvs · November 21, 2020, 6:21am

Thank you for the resources.

Topic		Replies	Views
How to train a translation model from scratch to reproduce <attention is all you need>? Beginners	0	409	November 29, 2022
How pretrained models are trained? Beginners	3	277	October 2, 2020
Has vanilla transformer implemented in transformers library? 🤗Transformers	3	1967	June 5, 2022
Train a transformer from scratch 🤗Transformers	0	445	August 9, 2021
Original transformers model implementation Beginners	2	1003	June 1, 2022

Way to train a basic Transformer

Related topics