Adding a classification head to M2M100's decoder

Is there a tutorial/example for adding a second classification head to the decoder of an encoder-decoder model like M2M100?

Unless I missed it the recently-released Hugging Face book only covers adding a second head to an encoder-only model like BERT.

I can see that I should be essentially recreating M2M100ForConditionalGeneration, adding my second head to init() and forward(). One thing I’m unsure about: How much of the original class’s bells & whistles I should include in my own version. What’s necessary? What’s not?

I could get into more detail about my use case, if needed.

Thanks!