Hello everyone,
I’d like to know if there’s a way to adjust the vocab size of a pre-trained Causal Language Model so that, for example, instead of being able to predict one of 50k words, it will be able to predict only n words (with n being predefined). Is this possible?
In general you would have to replace the embedding matrix. In this matrix the embeddings of all the vocabulary items are stored. The most easy way is to define a new matrix (Embedding — PyTorch 1.12 documentation) of the correct size (n x hidden) and initialize the vectores randomly. The problem is that this ruins (more or less) the entire model.
If there is large overlap between old and new matrix (vocabularies are similar) you can initialize the new embeddings by the corresponding old ones.