Token Classification Models on (Very) Long Text

nottakumasato · August 30, 2022, 7:58pm

Hi everyone,

From what I have seen, most token classification models out there have max token lengths less than 1k. Are there any models out there that can be used (i.e. customized) to be used with very long texts (long-form documents?

Assuming a model’s max token length is customizable, I assume its memory footprint has to be light for it to be able to batch a large number of embeddings&weights in GPU memory?

Any help/recommendation would be greatly appreciated in tackling this problem.

Thanks!

ccdv · August 31, 2022, 1:25pm

HI @nottakumasato

Most models have a 512 tokens limit and cannot extrapolate to longer sequences.
Memory footprint also increases quadratically with sequence length because standard attention is O(n²).

Best way to handle long sequences is to use a custom attention mecanism.
You can try this repo with a small model and a small block size, you should be able to process 16k tokens sequences.

AndreaSottana · August 31, 2022, 3:43pm

The BART model goes up to 1024 tokens.
Then there are models which can take up to 16k tokens but they’re more custom and not always available out of the box on HuggingFace. One of these is the Longformer for example. Their model can be accessed via HuggingFace as shown here. You may also want to take a look at this recent paper from Google. It is a model specific for text generation (not exactly classification as you asked, but gives you an idea for what’s possible) and they have also made their code available (you can see more details here and here - there is still an open PR which will be merged into the main HuggingFace branch soon, so right now you’d have to take their code from the fork)

nottakumasato · September 2, 2022, 10:38am

Thank you for the replies @ccdv & @AndreaSottana !

So I guess there are two ways to tackle this:

Split up the input text into segments that are less than the model’s max sequence (token) length
Find a model like Pegasus-X or Longformer that can handle all the samples (based on their sequence length) in my dataset

Option #1 seems more plausible and will give it a try.
Thanks!

nottakumasato · September 20, 2022, 1:33am

Is it also possible to use RNN (non-Transformer) based models? I assume the tradeoff is model “accuracy” vs the max sequence length?

nottakumasato · September 20, 2022, 3:20am

Best way to handle long sequences is to use a custom attention mecanism.

Is there a specific reason that you didn’t recommend using earlier RNN-based models? Since they don’t have an attention mechanism, their memory footprint should theoretically be linear to the sequence lenght, right?

nottakumasato · September 20, 2022, 3:21am

You can try this repo

Is there a paper about this LSG attention mechanism? Looks interesting and any further info would be appreciated to understand it a bit more.

Vignesh1997 · March 7, 2023, 6:22am

Hey did you try the paper if possible can you share us the results of how did it work

nottakumasato · March 9, 2023, 5:30pm

Unfortunately not

Topic		Replies	Views
Text classification training on long text Intermediate	3	4971	June 18, 2024
Token classification on long sentences 🤗Transformers	0	837	February 2, 2022
Sentiment analysis for long text - canonical solution Beginners	1	2454	April 22, 2023
Modeling long sequences Models	0	460	June 9, 2022
My input sentence is very long(more than 512). What should I do when I want to fintune model about classify?Thanks Beginners	3	1085	September 3, 2021

Token Classification Models on (Very) Long Text

Related topics