Hi. I have a quick question regarding sequence-to-sequence models. At the end of the video, it shows that these models can be constructed by combining encoder models(e.g. BERT) and decoder models(e.g. GPT).
I was wondering, how can RoBERTa (encoder-only model) be used both as an encoder and decoder?