I have bunch of documents but have some criteria’s, based on criteria it should prioritize the documents and list them out
Classification models?
Yes, there are models on Hugging Face that can be used for document classification. Here are some candidate models and a brief explanation of their suitability:
-
ProsusAI/finbert [2]: This model is primarily designed for text classification, particularly in the financial domain. It can be adapted for document classification if the focus is on financial-related documents.
-
jinaai/jina-reranker-v2-base-multilingual [2]: This model is a multilingual text classifier, suitable for classifying documents in multiple languages. It can be useful if your documents are in different languages.
-
distilbert/distilbert-base-uncased-finetuned-sst-2-english [2]: This is a sentiment analysis model. While it may not directly address document classification, it can be fine-tuned for this task if the classification criteria are related to sentiment.
-
cardiffnlp/twitter-roberta-base-sentiment-latest [2]: Another sentiment analysis model, similar to the above. It can be adapted for document classification if sentiment is a relevant criterion.
-
MilaNLProc/xlm-emo-t [2]: This model is designed for emotion classification in multilingual texts. It can be useful if your classification criteria are related to the emotional tone of the documents.
-
unitary/toxic-bert [2]: A toxicity detection model. It can be used if you need to classify documents based on toxicity or harmful content.
-
SamLowe/roberta-base-go_emotions [2]: A model for emotion classification, specifically for GoEmotions. It can be relevant if your criteria involve the emotional content of the documents.
-
textdetox/xlmr-large-toxicity-classifier [2]: Another toxicity detection model, similar to the unitary/toxic-bert model.
-
mixedbread-ai/mxbai-rerank-base-v1 [2]: This model is designed for reranking and can be adapted for document classification if reranking is part of your prioritization criteria.
-
LayoutLM-based models [4][5]: LayoutLM is specifically designed for document classification tasks, including understanding the layout and structure of documents. It can be a strong candidate if your documents have a structured format and you need high accuracy in classification.
Recommendation:
For general document classification with criteria-based prioritization, LayoutLM-based models [4][5] are highly recommended due to their ability to handle structured documents and achieve high accuracy. For more domain-specific tasks, consider fine-tuning ProsusAI/finbert [2] or other domain-specific models.