How to find similarity in documents longer than input sequence length?

bxff · August 20, 2022, 10:03pm

I am fairly new to ML/AI so I apologise before hand if I misunderstood things.

You cannot increase the length higher than what is maximally supported by the respective transformer model – Computing Sentence Embeddings — Sentence-Transformers documentation (sbert.net)

I am trying to find similarities between two documents provided by users, which don’t fit the sequence limit on most SBERT models of around 200-300 words. What should I do to find similarities between them? I couldn’t find any information on this, other than simply to truncate the input.

Topic		Replies	Views
Document Similarity of long documents e.g. legal contracts 🤗Transformers	6	8906	July 2, 2024
Summarization on long documents 🤗Transformers	63	59160	August 16, 2024
Longformer Token Length Beginners	2	1221	August 30, 2022
Is the way to input large size of text (over 512 words) exist? 🤗Transformers	0	941	October 27, 2021
Sentence length influence on similarity 🤗Transformers	1	396	February 17, 2022

How to find similarity in documents longer than input sequence length?

Related topics