Chapter 7 questions

Use this topic for any question about Chapter 7 of the course.

Hi - this is a relatively simple question but i’m totally new to HuggingFace so apologies in advance but on section 3 you discuss domain adaption.

I’m just experimenting with the task at the end of the section i.e. “To quantify the benefits of domain adaptation, fine-tune a classifier on the IMDb labels for both the pretrained and fine-tuned MiniLM checkpoints…”

Can you use the ‘Fill-Mask’ domain-adapted checkpoint you generated in the course (huggingface-course/distilbert-base-uncased-finetuned-imdb) for a classification task? Or do you have to adapt the original distilbert-base-uncased to the domain specifically for classification?

No, you would need to fine-tune it on the classification task next. It’s just that the fine-tuned masked language model might do better, since it’s more specialized on your corpus.

Hope that makes sense!

Thank you!! So I should just follow the standard fine-tuning methodology for classification as per Ch. 3 but use the ‘fine-tuned masked model’ as the starting checkpoint? i.e.

from transformers import TFAutoModelForSequenceClassification, AutoTokenizer

checkpoint = 'huggingface-course/distilbert-base-uncased-finetuned-imdb'

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)


That’s correct indeed!

Thank you very much!

Perhaps this is a typo? This sentence in the Question answering section

The answers field is a bit trickier as it comports a dictionary with two fields that are both lists.

I guess here it means comprises

1 Like

This is with regards to the translation section.

I don’t understand what is the purpose of adding a padding token at the start of the decoder_input_ids.
I understand that the decoder_input_ids is the labels shifted by one.

Does the labels have the ground truth, and when we check the next token for decoder_input_ids, we then compare it to the labels as part of the training?

batch['labels'] = tensor([[   83,  7471,    23, ...]])
batch['decoder_input_ids'] = tensor([[59513,    83,  7471,    23, ...]])
where 59513 is the pad token.

Many thanks

The decoder is generating an output by predicting each token one after the other with:

  • the encoder hidden state form the inputs
  • the previously predicted tokens of the outputs.

But for the very first token, there is no already previously predicted tokens, so we feed it a special token, which might be the pad token, or a special “beginning of stream” (bos) token. This part depends on the exact model.

That’s why the decoder inputs are the labels shifted by one with this special token at the start.

1 Like

In this code snippet, what is eval_preds? I can see it is the argument to the compute_metrics function, but I don’t know what it is, and hence why we know we can split it as a tuple.

def compute_metrics(eval_preds):
    preds, labels = eval_preds

I can see it is one of the arguments for Seq2SeqTrainer:

trainer = Seq2SeqTrainer(

But where would is it getting it’s argument, eval_preds from? The purpose of the compute_metrics function is to compare the predicted values with the actual values.
Are the labels (actual values) - data_collator[‘labels’] ?
Where do we get the predicted values from?

Many thanks.

I have seen “labels” used in translation

model_inputs["labels"] = labels["input_ids"]

and in token classification

tokenized_inputs["labels"] = new_labels

Is “labels” always used to hold the ground-truth, please?

Many thanks

In the translation section, what is the difference between
AutoModelForSeq2SeqLM and AutoModelForCausalLM please?
Is it:
AutoModelForSeq2SeqLM is used for language translation tasks
AutoModelForCausalLM is only for text generation (e.g. GPT+)

Many thanks

In the Chapter7:
Task : Question answering
When Running this code :

tf_train_dataset = train_dataset.to_tf_dataset(
tf_eval_dataset = validation_dataset.to_tf_dataset(
    columns=["input_ids", "attention_mask", "token_type_ids"],

I get this error :
TypeError: to_tf_dataset() missing 1 required positional argument: ‘collate_fn’
How to do it, especially as the validation and train datasets already include a padding to the max length!

1 Like

Hi @Abirate, thank you for this bug report! This is our fault - we recently changed the to_tf_dataset method to always require a collate_fn. I’m working on updating the course materials right now, and I’ll let you know as soon as a fixed version is available.


@Rocketknight1 Ok, thanks


I had this problem running the evaluation on Colab. Any ideas?

***** Running Evaluation *****
Num examples = 21018
Batch size = 64
[329/329 21:25]

TypeError Traceback (most recent call last)
in ()
----> 1 trainer.evaluate(max_length=max_target_length)

2 frames
/usr/local/lib/python3.7/dist-packages/transformers/ in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
2408 if all_losses is not None:
→ 2409 metrics[f"{metric_key_prefix}loss"] = all_losses.mean().item()
2411 # Prefix all keys with metric_key_prefix + '

TypeError: ‘NoneType’ object does not support item assignment

1 Like

Hey @mbateman in which section in chapter 7 do you find this error? I’d like to run the relevant Colab notebook myself to see if I can reproduce the error :slight_smile:

Hi @lewtun thanks for getting back to me. This was in the fine tuning subsection of the translation section:


Doesn’t happen when run locally.

Hope that helps.


I encountered the same problem and it seem problem is compute_metrics does not return anything and also metric.compute is not used inside that function. Since return value is missing, this result in metrics = None and then NoneType item assignment error. Adding return metric.compute(predictions=decoded_preds, references=decoded_labels) to compute_metrics solve the problem for me.

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    # In case the model returns more than the prediction logits
    if isinstance(preds, tuple):
        preds = preds[0]

    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    # Replace -100s in the labels as we can't decode them
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Some simple post-processing
    decoded_preds = [pred.strip() for pred in decoded_preds]
    decoded_labels = [[label.strip()] for label in decoded_labels]
    return metric.compute(predictions=decoded_preds, references=decoded_labels)
1 Like

Thanks for catching this bug @PyaePK ! I’ll post a fix to the website and notebooks :hugs: