Chapter 7 questions

Use this topic for any question about Chapter 7 of the course.

Hi - this is a relatively simple question but i’m totally new to HuggingFace so apologies in advance but on section 3 you discuss domain adaption.

I’m just experimenting with the task at the end of the section i.e. “To quantify the benefits of domain adaptation, fine-tune a classifier on the IMDb labels for both the pretrained and fine-tuned MiniLM checkpoints…”

Can you use the ‘Fill-Mask’ domain-adapted checkpoint you generated in the course (huggingface-course/distilbert-base-uncased-finetuned-imdb) for a classification task? Or do you have to adapt the original distilbert-base-uncased to the domain specifically for classification?

No, you would need to fine-tune it on the classification task next. It’s just that the fine-tuned masked language model might do better, since it’s more specialized on your corpus.

Hope that makes sense!

Thank you!! So I should just follow the standard fine-tuning methodology for classification as per Ch. 3 but use the ‘fine-tuned masked model’ as the starting checkpoint? i.e.

from transformers import TFAutoModelForSequenceClassification, AutoTokenizer

checkpoint = 'huggingface-course/distilbert-base-uncased-finetuned-imdb'

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

...

That’s correct indeed!

Thank you very much!