Test if a sentence is different from the training data

linkus01122 · November 11, 2024, 2:54am

Is there a way for sbert or other models to flag if new training data is similar to existing training data?

I am trying to use models to determine if new sentences that are to be used for training are similar to the already existing training data or if they are uniquely different.

An example I can think of is I have a corpus of strings all talking about walls and bricks. Adding another brick in the wall doesn’t set off the flag but painting it black would as it isn’t similar to the training corpus.

Topic		Replies	Views
Test a Model's knowledge Beginners	0	252	May 3, 2022
Identifying and getting right embeddings from the fine tuned BERT on domain specific data Intermediate	0	1331	September 8, 2021
Request for Further Information on Datasets Beginners	0	280	November 26, 2020
Fine tuning a sentence-transformer for cosine sim on 500k sentence pairs without labels-- advice 🤗Transformers	2	1198	April 20, 2024
Any BERT model recommendation needed for getting feature of structured sentences Beginners	0	398	June 8, 2022

Test if a sentence is different from the training data

Related topics