Separate pre-trained encoder and decoder

Hi everyone, I am trying to build sentece to sentence language model using pretrained encoder-decoder architecture, where I want to add noise to the sentence embedding got from the encoder. Here I want to use pretrained transformer based encoder-decoder, how can I do that using huggingface?

What I can see in huggingface is very high level Trainer API, and I even tried going to the codebase and tried to use the base model’s encoder and decoder but found very difficult. Is there any way to do this?