With respect to GPT2 Training is it safe to assume, that tokens following the token will be the first to be fed into the decoder? Meaning if I have [“Hello”,"",“Hello”,“How”,“are”,“you”]. The First “Hello” is used in Positional Encoding, but the first token to be sent into the decoder block is the ", right?