IMDb score prediction

frozomod · December 22, 2023, 9:21pm

Hi everyone!
I have a corpus of dialogues from 600 movie scripts (about 33 MB) along with their IMDb scores and genres. I want to train a model that can predict score based on a dialogue and genre.
Should I use pre-trained models and fine-tune them with my data or should I start from scratch. Which option will be more hardware demanding?

nielsr · December 23, 2023, 11:30am

It’s advised to start from a pre-trained model and fine-tune it on your custom dataset, as 600 examples is typical for such a use case. For reference, pre-training is done on terrabytes of data, on clusters of GPUs.

If you want to predict the score, then you can use the AutoModelForSequenceClassification class and pass problem_type=“regression” (as this is a regression problem):

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("roberta-base", problem_type="regression")

This will make sure the mean-squared error (MSE) loss is used. Next, you can fine-tune it as shown in this tutorial (some updates need to be made to adapt it from classification to regression).

Topic		Replies	Views
Can we train Sentence transformer model for Sequence classification 🤗Transformers	5	6554	June 14, 2023
Supervised Fine-tuning Trainer - where is the 'supervised' part? Beginners	0	448	July 3, 2023
Not enough values to unpack (expected 2, got 1) in training IMDB dataset Models	1	894	March 2, 2022
Sequence Classification -- Fine Tune? Beginners	3	3138	January 31, 2021
Finetuning a specific task when pretrained model isn't trained on that specific task? Using the task model vs using the base model Beginners	4	1019	September 14, 2020

IMDb score prediction

Related topics