Support for LLaMA in EncoderDecoder framework

I’m trying to use LLaMA as a drop-in replacement for GPT2 in my ViT-GPT2 model.

After seeing issue Using FNet model in Encoder Decoder Models · Issue #22308 · huggingface/transformers · GitHub, it seems like HuggingFace doesn’t plan to support future models in the EncoderDecoder framework and I should adapt the model to suit my own needs.

I’m planning to follow the steps described in Trying to add support for GPT2 as decoder in EncoderDecoder model · Issue #4483 · huggingface/transformers · GitHub

Are there any gotchas I should know about?

2 Likes

Hey, Did you try to do it? I’m trying to merge a bge-small-1.5 encoder with a Llama2-1B with EncoderDecoderModel. The issue is that Llama2 is decoder-only and doesn’t support cross attention, so the EncoderDecoderModel don’t let me join both

1 Like