Forward Function Output of XGLMForCausalLM

Hello everyone,

I’m currently working with XGLM models and I was wondering why the forward function returns CausalLMOutputWithCrossAttentions instead of CausalLMOutputWithPast (used by other decoder-model causalLMheads) or other classes. I was confused by the name because decoder-only models do not have cross attentions like encoder-decoder models.

Could someone help me to understand the differences and the design choice behind? Thank you all!