Any language model which utilizes both encoder and decoder output for multi-task learning?

Hi guys,

Is there any language model which uses both encoder and decoder output (loss) for multi-task learning?
For example, NER (token classification / encoder) loss + Translation (generation / decoder) loss.
(Maybe classifying NE can help the model understand input better for the translation, just an example!)

It may not be that difficult to implement it by myself, but it would be much easier if someone already made a modeling_*.py in the library.

Thanks!