Smaller output vocabulary for GPT-2

iRove · July 18, 2020, 7:36am

I noticed that by default, GPT2LMHeadModel returns prediction scores of shape (batch_size, sequence_length, config.vocab_size) (docs link). Is there any way for me to limit the output vocabulary to only a subset of words?

I want to take the existing weights from GPT-2, but re-train a new top linear layer with a smaller vocabulary. I suppose I could mask the logits at the end, but then it feels like a waste of computational power to even predict them.

sgugger · July 20, 2020, 1:45pm

Note that this model has the weights of the encoder and the decoder tied, so if you want to use the existing weights, you probably want to just mask the indices of the tokens you don’t want to use in your predictions.
Otherwise you can try to replace the last layer, but you will need to adapt the code in modeling_gpt2.py to do this.

Topic		Replies	Views
Train GPT2 from scratch (Tensorflow) - Loss function 🤗Transformers	1	2081	July 21, 2021
Train GPT2 from scratch (Tensorflow) - Loss function issue Beginners	0	718	March 11, 2021
GPT2LMHeadModel.from_pretrained('gpt2') not loading attn weights Beginners	1	2093	July 22, 2020
How to put a classification head on top of GPT2 model? Beginners	0	902	November 21, 2022
Resources for model design (number of layers, attention heads, etc) Beginners	2	596	January 4, 2021

Smaller output vocabulary for GPT-2

Related topics