Project: Create a new zero-shot model with NLI data

Description

The zero-shot classification pipeline has becomes very popular on Hugging Face. It allows you to classify a text in any category without having to fine-tune a model for the specific classification task you are interested in.
The zero-shot pipeline is based on models trained on Natural Language Inference (NLI). This project will train a new NLI model, which can then be used in the zero-shot classification pipeline.

Model(s)

Any base-model can be used. Since there are already several NLI models on the model hub, I suggest to train a new model based on Microsoft’s DeBERTa-v3 model. Version three was only published few weeks ago and can outperform larger models (see an example here).
We can probably create a new SOTA NLI model with the new DeBERTa version and enough NLI data.

Datasets

Established NLI datasets include:
MultiNLI
SNLI
ANLI

Other interesting NLI datasets include:
FEVER-NLI
DocNLI
LingNLI
More datasets can be included!

Challenges

  • NLI models can be trained as either 3-class classifiers (entailment/neutral/contradiction) or as 2-class classifiers (entailment/not_entailment). Both setups have different advantages and disadvantages
  • There is a lot of NLI data (2 mio++ texts in the datasets linked above), which makes training computationally expensive. Optimising the training pipeline is a challenge.
  • Many different datasets can be translated into NLI-format. Including more datasets can be beneficial, but requires manual transformation of datasets.

Desired project outcomes

  • Create a Streamlit or Gradio app on :hugs: Spaces that provides an interface for zero-shot classification with a new NLI model in the backend.

Additional resources

See the links to the datasets above. Also see Joe Davidson’s original blog post on the zeroshot pipeline

Discord channel

To chat and organise with other people interested in this project, head over to our Discord and:

  • Follow the instructions on the #join-course channel

  • Join the #zero-shot channel

Just make sure you comment here to indicate that you’ll be contributing to this project :slight_smile:

4 Likes

Hyee! I’d love to contribute in this one. I guess further discussion will take place in Discord?

2 Likes

Hey @HarrySaini, yep Discord is probably the best place to coordinate / discuss more efficiently :slight_smile:

2 Likes

Interested. Messaged in Discord as well.

1 Like

I happen to have trained a bilingual MNLI model recently, so I decided to give this project a shot as well (without taking a place on the team). In theory this should work on English and Russian. Any feedback is very welcome.

1 Like

@MoritzLaurer was there any update on this? I would be very interested in a superior model to facebook/bart-large-mnli

yeah, this one is better than bart-large-mnli: MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli · Hugging Face (or this one: MoritzLaurer/DeBERTa-v3-base-mnli-fever-docnli-ling-2c · Hugging Face) and they should also be faster

2 Likes

Great! Thanks, I will be comparing them for zero-shot text classification.

Hi @MoritzLaurer
Is there a way to update the first post with some example notebooks on how to take any transformer-based models and one or more NLI-based datasets and fine-tune a new Zero-Shot text classifier? (some architectures are missing from the Models Hub like BERT, I would like to add those)

I used the code in the run_xnli script and it worked well! :slight_smile: