I’ve created a basic Seq2Seq transformer from scratch in PyTorch (mainly so that I can learn the architecture), and I was wondering if it is possible to train this kind of model using the HuggingFace Tokenizers, Dataset, and Trainer classes. I’d rather not hand-code the tokenization and training loop for the transformer from scratch.
Is there a way to do this with HF? Existing tutorials seem to exclusively use models that are already in the library.
If it is possible, what parts of the training would I still need to take care of (loss, masking, label smoothing, greedy decode, etc.)?