Is teacher forcing included in GPT2

I have been recently studying about GPT2. Can someone tell me whether Decoder only Models uses teacher forcing like Encoder Decoder models?

I have seen the GPT2 implementation of huggingface and they use the labels to only calculate loss. How does the model