Sliding Transformer into a long sequence

Hello everyone,

I have very long genome sequences where I have to do some classification stuff on top. What I want to try is to use a transformer to predict the next token from the 512 chunk sequence and slide this transformer to the whole sequence and use those to work on top of the whole sequence.

Let’s say an example: Imagine I have a 250.000 token sequence, I should slide the transformer 488 times producing 488 tokens. Concatenate this output to obtain a summary array of the sequence and build a classifier on top of it.

I’m trying to find any examples that could guide me in this direction but I hardly can find any of them. Does someone think that’s a good idea? Where could I look for some near examples of sliding a transformer/LSTM over a longer sequence?

Thank you very much, I’ll appreciate everything!

1 Like

Hi @mdelas,

I’m not very familiar with your problem but I found a thread in the forum that seems to be similar to yours but for question answering task: Handling long text in BERT for Question Answering

1 Like

Thank you very much @rwheel for your proposal! I will follow and try to figure out how can I use it. I posted and edited this question on deep learning - Sliding Transformer model into longer sequence - Stack Overflow . Do you have some other references of similar work over there?

No, I haven’t read anything else about it. However, if I find something of interest, I will let you know :slight_smile:

Have a good day.

1 Like