Alibi and Extrapolation

RicketyRick · May 29, 2023, 6:26pm

Hi,

I read the paper about alibi and extrapolating the sequence length. I want to alter a given decoder transformer with alibi attention. I understand the alibi implementation but I am stuck on understanding the extrapolation. In the paper they try an extrapolation on a given positional attention network and it doesn’t work good. Okay, got it, Alibi is better, but how did they do the extrapolation? I mean, you have 512 input embeddings, how do I put 1024 embeddings into it?
Can anyone help? Thank you so much!

Topic		Replies	Views
Best models for seq2seq tasks 🤗Transformers	3	1126	August 16, 2020
Fine-tuning Decoder-only or Encoder-Decoder models for classification 🤗Transformers	0	668	July 17, 2024
Conceptual questions about transformers 🤗Transformers	10	1081	August 26, 2021
Seq-2-Seq Predictions for Longer Sequences and Question for compute metrics function Beginners	0	454	December 16, 2021
About the update of parameters for transformer 🤗Transformers	0	477	November 3, 2022

Alibi and Extrapolation

Related topics