Anomaly Detection / Out of Domain Detection with BERT

Are there any best-practices how to detect if a document is out of the domain, a fine-tuned BERT model was trained on? One idea is to perform an anomaly detection before applying the fine-tuned model. The anomaly detection could be a 1-class SVM, or an autoencoder based on SBERT embeddings. Another way would be adding an “Other” class to the classification model, but it would probably be highly imbalanced.

I am wondering if there are recommended approach for this common real-world problem?

1 Like