How to find similarity in documents longer than input sequence length?

I am fairly new to ML/AI so I apologise before hand if I misunderstood things.

You cannot increase the length higher than what is maximally supported by the respective transformer model – Computing Sentence Embeddings — Sentence-Transformers documentation (sbert.net)

I am trying to find similarities between two documents provided by users, which don’t fit the sequence limit on most SBERT models of around 200-300 words. What should I do to find similarities between them? I couldn’t find any information on this, other than simply to truncate the input.

1 Like