If I plan to use a GPT2 Base Model and call the forward method with an explicit inputs_embeds,
does that mean I can set vocab_size=1? So for example if, by some preprocessing,
I’ve converted my length L text T into a 1 x L x 768 tensor E, then am I right that model(inputs_embeds=E) combines my explicit E with the positional embeddings in model.wpe but ignores model.wte?
If this is not right, what is the relationship between vocab_size and explicit embeddings?