How to employ different vocabs for encoder and decoder respectively?

yijiang · November 9, 2021, 8:37am

In my case, the target tokens are only a small subset of the entire vocab file so it would be beneficial to use a smaller vocab in the decoding process in terms of the size reduction of the final projection layer of decoder and the efficiency increase of the softmax layer. To achieve this, I suppose that the decoder needs a tokenizer with a smaller vocab and as a result, the embedding layer of the decoder will also have a smaller size. Is there any existing way to implement this? If not, do I need to train a specific tokenizer on my own and replace the pretrained decoder-embedding layer?

Topic		Replies	Views
Create entirely new vocabulary for tokenizer 🤗Tokenizers	0	118	May 30, 2024
Deleting tokens from a Seq2Seq model 🤗Transformers	0	188	January 24, 2024
Tokenizer shrinking recipes 🤗Tokenizers	7	2734	December 24, 2023
Dynamic decoder token masking 🤗Transformers	0	242	February 13, 2023
Tokenized sequence lengths 🤗Tokenizers	6	2072	March 10, 2022

How to employ different vocabs for encoder and decoder respectively?

Related topics