XLNet recurrence mechanism on long sequences

amine · May 2, 2022, 4:08pm

Hello,

Can anyone tell me what happens when we feed an input sequence of more than 512 tokens to an XLNet model?
I think that the model is currently applied to the whole sequence as is (without any chunking)…

From the paper, I understood that the implementation will “internally” chunk the sequence and apply the model on each chunk (and re-use the cached hidden states of the previous chunk).

Is this actually the case? Or do I need to chunk the sequence beforehand (my-self) and feed each chunk to the model along with the cached hidden states from the previous chunk (mems)?

When I check the implementation, I see that the attention layer is applied directly to the whole input sequence. Only the feed forward layer is applied to each chunk separately but it seems that there is only one chunk even for long sequences when using with default settings.

Thanks !

Topic		Replies	Views
Question on splitting input sequence Beginners	3	5585	June 14, 2022
Variable num_predict in target_mapping for XLNet Models	3	415	January 2, 2021
Fine-tuning XLNet for permutation language modeling: what is the required format of the train data? 🤗Transformers	0	675	July 21, 2021
Token Chunking in Causal/Masked Language Modeling Course	0	846	November 7, 2023
Chunks and batches in MLMs Beginners	1	1755	June 22, 2023

XLNet recurrence mechanism on long sequences

Related topics