How to employ different vocabs for encoder and decoder respectively?

In my case, the target tokens are only a small subset of the entire vocab file so it would be beneficial to use a smaller vocab in the decoding process in terms of the size reduction of the final projection layer of decoder and the efficiency increase of the softmax layer. To achieve this, I suppose that the decoder needs a tokenizer with a smaller vocab and as a result, the embedding layer of the decoder will also have a smaller size. Is there any existing way to implement this? If not, do I need to train a specific tokenizer on my own and replace the pretrained decoder-embedding layer?

2 Likes