Hi,
I want to build a:
-
MultiClass Label
(eg: Sentiment with VeryPositiv, Positiv, No_Opinion, Mixed_Opinion, Negativ, VeryNegativ) -
and a MultiLabel-MultiClass model to detect 10 topics in phrases
(eg: Science, Business, Religion …etc)
and I am not sure where to find the best model for these types of tasks?
I understand this refers to the Sequence Classification Task. So, I could search for a model tagged with that task on your model repository site - but not all models are tagged like that and the transformers API seems to provide much more task applications beyond the original training.
I found with the code below that I can have a model that supports originally 5 labels but load it into a ConvBertForSequenceClassification
model to support, for example 25 labels. Would this (plus softmax
or sigmoid
and fine-tuning) be the correct way to pick up an existing model and implement 1. or 2. or is there a different more effective way to choose a model and fine tune it?
Thanks dirk
from transformers import pipeline
nlp = pipeline("sentiment-analysis", 'bert-base-multilingual-uncased-sentiment')
result = nlp("I hate you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
result = nlp("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
#label: 1 star, with score: 0.6346
#label: 5 stars, with score: 0.8547
from transformers import ConvBertForSequenceClassification, ConvBertTokenizer
convBertModel = ConvBertForSequenceClassification.from_pretrained('bert-base-multilingual-uncased-sentiment', num_labels=25)
convBerttokenizer = ConvBertTokenizer.from_pretrained('bert-base-multilingual-uncased-sentiment')
print ( f" num_labels: {model.num_labels}")
print ( f" classifier: {model.classifier}")
# num_labels: 25
# classifier: ConvBertClassificationHead(
# (dense): Linear(in_features=768, out_features=768, bias=True)
# (dropout): Dropout(p=0.1, inplace=False)
# (out_proj): Linear(in_features=768, out_features=25, bias=True)
# )