Hi all,
I started a small project where I am trying to fine-tune a zero-shot classification model on a proprietary dataset. I was thinking to use the NLI approach, building contradiction and entailment statements for each of my sentences/labels pairs.
I have a dataset with sentences and for each of them multiple true labels.
However, I am not sure on what is the best way to approach this, given that in literature I have only seen the case where there is only one label per sentence.
Making one example:
Sentence 1. Classes = [‘A’,‘B’,‘C’]
Should I build my dataset generating three different samples
Sentence 1. This is about ‘A’ + Entailment label
Sentence 1. This is about ‘B’ + Entailment label
Sentence 1. This is about ‘C’ + Entailment label
or generating only one as follows:
Sentence 1. This is about A, B, C. + Entailment label
The recommend approach is what you are suggesting in option 1. Basically present each of your multiple labels as entailments separately, but the author also suggests presenting an equal number of contradictions.