Transformers - Codex - context length of 4096 Tokens

How does Codex, a descendant of GPT-3 allow a context length 4096 tokens while GPT-3 allows only 2048, using Transformers architecture.

I have gone through the OpenAI Codex paper, but couldn’t find any information related to it. Could anyone tell how this token limit was increased and what was the technique used?

1 Like