Fine tune Zero-shot classification on multi-label dataset

Hi all,
I started a small project where I am trying to fine-tune a zero-shot classification model on a proprietary dataset. I was thinking to use the NLI approach, building contradiction and entailment statements for each of my sentences/labels pairs.

I have a dataset with sentences and for each of them multiple true labels.

However, I am not sure on what is the best way to approach this, given that in literature I have only seen the case where there is only one label per sentence.

Making one example:

Sentence 1. Classes = [‘A’,‘B’,‘C’]

Should I build my dataset generating three different samples

Sentence 1. This is about ‘A’ + Entailment label
Sentence 1. This is about ‘B’ + Entailment label
Sentence 1. This is about ‘C’ + Entailment label

or generating only one as follows:

Sentence 1. This is about A, B, C. + Entailment label

I am happy to hear any other ideas on this.

Thanks a lot!

Here is one approach depending on the number of labels you have

thanks for pointing to this resource, but this is useful only for a classic multi-label classification problem.

I am looking into fine-tuning of a zero-shot classification model using the entailment-contradiction approach.

Am curious why you wouldn’t treat it as a multi classification problem? Is there a reason it needs to be NLI. Will help me learn! Thanks

According to this here

The recommend approach is what you are suggesting in option 1. Basically present each of your multiple labels as entailments separately, but the author also suggests presenting an equal number of contradictions.