Improving zero-shot-classifier performance?

serpa4 · June 5, 2025, 6:03pm

Hello!

Trying to learn and make myself more familiar to AI and ML stuff so tried to tinker about. I want classifier that classifies short descriptions of books (around 70 words) between comedy genre/non comedy.
Since it is all unlabeled zero-shot-classification seems to be the most adequate solution to “understand” the text and classify the books. I am using “facebook/bart-large-mnli” because it seems to be the apt tool to use after doing some research.

However after running my code, the results are pretty poor. The system classifies everything as comedy with scores of (0.8-0.95), even those books I know for a fact are more serious or even tragic.

I am a bit of a loss. I have no clue what to do to improve the performance. Here is my code:

import pandas as pd
from transformers import pipeline


pipe = pipeline("zero-shot-classification", model = "facebook/bart-large-mnli")


def load_csv_file(filename):
    return pd.read_csv(filename)


if __name__ == "__main__":
    comedy_score_zero_shot = []
    df = load_csv_file("books.csv")
    print(df)
    for index, row in df.iterrows():
        prompt = f'Given the sypnosis of the following work: {row["description"]}. Is it a comedy or a serious work?'
        labels = [
            "comedic",
            "serious",
            ]
        
        #print(prompt)
        result = pipe(prompt, labels, hypothesis_template="This play is  {}.")
        print(f"{index}  {row["title"]}: {result["scores"]}")
        comedy_score_zero_shot.append(result["scores"][0])
    
    df_scores = pd.DataFrame({'comedy_score_Zero_shot': comedy_score_zero_shot})
    df_scores.to_csv('comedy_scores.csv', index=False)

Is something wrong? Am I missing something regarding zero-shot-classification? Is the prompt too long? Is this not the appropriate model to be using in this case? Any suggestions or criticisms are welcome!
Thanks!

John6666 · June 6, 2025, 12:14pm

I don’t think it has conversational abilities like the so-called LLM Instruct model, so I think prompts with only data are better for now.

#prompt = f'Given the sypnosis of the following work: {row["description"]}. Is it a comedy or a serious work?'
prompt = row["description"]

Topic		Replies	Views
New pipeline for zero-shot text classification 🤗Transformers	107	71723	February 17, 2025
Alternative approaches for text classification task 🤗Transformers	0	426	October 25, 2022
Zero-shot classification using models not explicitly meant for that? 🤗Transformers	1	42	February 26, 2025
Zero shot classification with manual pytorch Beginners	0	720	August 27, 2021
Zero-shot Classification With Generative Language Models 🤗Transformers	0	711	October 12, 2023

Improving zero-shot-classifier performance?

Related topics