Hi everybody and thank you in advance for anyone who can help my out. I am not a total beginner when it comes to huggingface libraries (I have already built a well functioning sentiment analyzer) however I have mostly taken tutorials and integrated their content without going too much into details of who each line of code does. Trying to learn more I have put together a document classifier using a couple of tutorials I’ve found online.
I have built the trainer and the validator and they work just fine. I started with a dataset that assigns 6 different labels to a text, with each text having 0, 1 or more than 1 label. I trained the model and saved it. My problem is: now what? I can’t understand exactly how to do the prediction part. Here is where I am:
def validation():
model = torch.load(destination_folder+'model.pt')
model.eval()
with torch.no_grad():
for _, data in enumerate(testing_loader, 0):
ids = data['ids'].to(device, dtype = torch.long)
mask = data['mask'].to(device, dtype = torch.long)
token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
preds = model(ids, mask,token_type_ids)
print(preds.argmax(1) + 1)
This is a snippet of the output of the print command:
tensor([1, 1, 1, 1])
tensor([6, 1, 1, 1])
tensor([1, 1, 1, 1])
tensor([1, 5, 2, 1])
I’ve done this using the validation data and by adapting the validation routine, while in reality I would need to do this for a single line of text, but regardless of the way the data is fed to the prediction function, how do I read the prediction data? How do I go from “This is the text of my document to be classified” to “This document is 75% label1, 15% label5, 2% label6”?
Again, thank you in advance for any help!