Nystrƶmformer or YOSO for Conditional Generation?

I would like to train a transformer (like T5) from scratch on a custom dataset for purposes of doing conditional generation. My dataset contains long sequences, on the order of thousands of tokens, so I am keen to try one of these new models that implements linear-complexity self-attention. However, neither of these models have a (Model)ForConditionalGeneration class implemented. Would it be possible to implement them?