Sliding Transformer into a long sequence

Hello everyone,

I have very long genome sequences where I have to do some classification stuff on top. What I want to try is to use a transformer to predict the next token from the 512 chunk sequence and slide this transformer to the whole sequence and use those to work on top of the whole sequence.

Let’s say an example: Imagine I have a 250.000 token sequence, I should slide the transformer 488 times producing 488 tokens. Concatenate this output to obtain a summary array of the sequence and build a classifier on top of it.

I’m trying to find any examples that could guide me in this direction but I hardly can find any of them. Does someone think that’s a good idea? Where could I look for some near examples of sliding a transformer/LSTM over a longer sequence?

Thank you very much, I’ll appreciate everything!

