I noticed that by default,
GPT2LMHeadModel returns prediction scores of shape
(batch_size, sequence_length, config.vocab_size) (docs link). Is there any way for me to limit the output vocabulary to only a subset of words?
I want to take the existing weights from GPT-2, but re-train a new top linear layer with a smaller vocabulary. I suppose I could mask the logits at the end, but then it feels like a waste of computational power to even predict them.