Multi-Encoder Transformer

Hello I want to create a multi- encoder transformer but I don´t know if this is possible in hugging face. I recenctly read Multi-Encoder Transformer for Korean Abstractive Text Summarization, they made a transformer with multiple encoders, so they can take multiple inputs that gives information for a specific task. I want to create something like this