Multi-Label Model - Labels per part of sentence

Hi All! I have fine tuned and trained a multi-label model using BERT base-uncased. I was asked to output predictions from the model not just by the top-k labels predicted for the text, but by top-k and the part of the text associated with the label.

Is this possible? I’ve seen the outputs of NER models that show the token the prediction was based on. Is it possible to do the same/similar thing for a multi-label model?

An output example might be:

text = “I am paid well and have a great team”
pipe(text)

‘label’: ‘Pay’,
‘score’: 0.98,
‘tokens’: {‘I’, ‘am’, ‘paid’}
etc.

I greatly appreciate any guidance/suggestions on how to get this data out.

Hey Tardis,
Interesting problem.
Few questions, how long are your input texts? how long should the output tokens be?
I did something similar ones, by first tokenize the text by sentences (available in NLTK) and then applying the multi-label model on each of them. Then filter by probability threshold to get final list of labels by merging them all. You could do the same on each sentence and it would be that part of text.
Eitan