I want to do text to text classification.
I know that classifying text to a set of categories is a simple problem to solve.
In my case, rather than having categories, I have open-ended text. On the left I have my set of texts. On the right, one would typically have numerical categories and it would be simple to train a model, instead, I have text as well.
When there’s a relationship, it doesn’t imply that there’s semantic similarity across the pair of texts.
What would you recommend to do to train a model that can predict the relationships from a text on the left, to the set of text on the right?
A naive and simple approach would be to give to each text on the right a numerical value, but then I would be losing valuable information.
Perhaps a better approach would be to find a way to fine tune a sentence similarity model, and then, when trying to find the relationship for a text on the left, do sentence similarity with each text on the right and take the top-k?
Suggestions? Thank you!