Chapter 7 questions

sgugger · November 15, 2021, 2:13pm

Use this topic for any question about Chapter 7 of the course.

emvy03 · November 22, 2021, 12:34pm

Hi - this is a relatively simple question but i’m totally new to HuggingFace so apologies in advance but on section 3 you discuss domain adaption.

I’m just experimenting with the task at the end of the section i.e. “To quantify the benefits of domain adaptation, fine-tune a classifier on the IMDb labels for both the pretrained and fine-tuned MiniLM checkpoints…”

Can you use the ‘Fill-Mask’ domain-adapted checkpoint you generated in the course (huggingface-course/distilbert-base-uncased-finetuned-imdb) for a classification task? Or do you have to adapt the original distilbert-base-uncased to the domain specifically for classification?

sgugger · November 22, 2021, 12:51pm

No, you would need to fine-tune it on the classification task next. It’s just that the fine-tuned masked language model might do better, since it’s more specialized on your corpus.

Hope that makes sense!

emvy03 · November 22, 2021, 1:15pm

Thank you!! So I should just follow the standard fine-tuning methodology for classification as per Ch. 3 but use the ‘fine-tuned masked model’ as the starting checkpoint? i.e.

from transformers import TFAutoModelForSequenceClassification, AutoTokenizer

checkpoint = 'huggingface-course/distilbert-base-uncased-finetuned-imdb'

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

...

sgugger · November 22, 2021, 2:56pm

That’s correct indeed!

emvy03 · November 22, 2021, 7:17pm

Thank you very much!

ducatyb · December 26, 2021, 11:45pm

Perhaps this is a typo? This sentence in the Question answering section

The answers field is a bit trickier as it comports a dictionary with two fields that are both lists.

I guess here it means comprises

jon-fernandes · December 29, 2021, 11:56am

This is with regards to the translation section.

I don’t understand what is the purpose of adding a padding token at the start of the decoder_input_ids.
I understand that the decoder_input_ids is the labels shifted by one.

Does the labels have the ground truth, and when we check the next token for decoder_input_ids, we then compare it to the labels as part of the training?
i.e.

batch['labels'] = tensor([[   83,  7471,    23, ...]])
batch['decoder_input_ids'] = tensor([[59513,    83,  7471,    23, ...]])
where 59513 is the pad token.

Many thanks

sgugger · December 29, 2021, 3:26pm

The decoder is generating an output by predicting each token one after the other with:

the encoder hidden state form the inputs
the previously predicted tokens of the outputs.

But for the very first token, there is no already previously predicted tokens, so we feed it a special token, which might be the pad token, or a special “beginning of stream” (bos) token. This part depends on the exact model.

That’s why the decoder inputs are the labels shifted by one with this special token at the start.

jon-fernandes · December 29, 2021, 5:47pm

In this code snippet, what is eval_preds? I can see it is the argument to the compute_metrics function, but I don’t know what it is, and hence why we know we can split it as a tuple.

def compute_metrics(eval_preds):
    preds, labels = eval_preds

I can see it is one of the arguments for Seq2SeqTrainer:

trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,

But where would is it getting it’s argument, eval_preds from? The purpose of the compute_metrics function is to compare the predicted values with the actual values.
Are the labels (actual values) - data_collator[‘labels’] ?
Where do we get the predicted values from?

Many thanks.

jon-fernandes · December 29, 2021, 5:53pm

I have seen “labels” used in translation

model_inputs["labels"] = labels["input_ids"]

and in token classification

tokenized_inputs["labels"] = new_labels

Is “labels” always used to hold the ground-truth, please?

Many thanks

jon-fernandes · December 30, 2021, 3:25pm

In the translation section, what is the difference between
AutoModelForSeq2SeqLM and AutoModelForCausalLM please?
Is it:
AutoModelForSeq2SeqLM is used for language translation tasks
AutoModelForCausalLM is only for text generation (e.g. GPT+)

Many thanks

Abirate · January 6, 2022, 9:45pm

In the Chapter7:
Task : Question answering
When Running this code :

tf_train_dataset = train_dataset.to_tf_dataset(
    columns=[
        "input_ids",
        "start_positions",
        "end_positions",
        "attention_mask",
        "token_type_ids",
    ],
    dummy_labels=True,
    shuffle=True,
    batch_size=16,
)
tf_eval_dataset = validation_dataset.to_tf_dataset(
    columns=["input_ids", "attention_mask", "token_type_ids"],
    shuffle=False,
    batch_size=16,
)

I get this error :
TypeError: to_tf_dataset() missing 1 required positional argument: ‘collate_fn’
How to do it, especially as the validation and train datasets already include a padding to the max length!

Rocketknight1 · January 10, 2022, 5:29pm

Hi @Abirate, thank you for this bug report! This is our fault - we recently changed the to_tf_dataset method to always require a collate_fn. I’m working on updating the course materials right now, and I’ll let you know as soon as a fixed version is available.

Abirate · January 10, 2022, 6:47pm

@Rocketknight1 Ok, thanks

mbateman · January 27, 2022, 5:00pm

Hi,

I had this problem running the evaluation on Colab. Any ideas?

* Running Evaluation *
Num examples = 21018
Batch size = 64
[329/329 21:25]

TypeError Traceback (most recent call last)
in ()
----> 1 trainer.evaluate(max_length=max_target_length)

2 frames
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
2407
2408 if all_losses is not None:
→ 2409 metrics[f"{metric_key_prefix}loss"] = all_losses.mean().item()
2410
2411 # Prefix all keys with metric_key_prefix + '’

TypeError: ‘NoneType’ object does not support item assignment

lewtun · January 27, 2022, 10:42pm

Hey @mbateman in which section in chapter 7 do you find this error? I’d like to run the relevant Colab notebook myself to see if I can reproduce the error

mbateman · January 28, 2022, 9:06am

Hi @lewtun thanks for getting back to me. This was in the fine tuning subsection of the translation section:

trainer.evaluate(max_length=max_target_length)
trainer.train()
trainer.evaluate(max_length=max_target_length)

Doesn’t happen when run locally.

Hope that helps.

Michael

PyaePK · February 14, 2022, 4:06pm

I encountered the same problem and it seem problem is compute_metrics does not return anything and also metric.compute is not used inside that function. Since return value is missing, this result in metrics = None and then NoneType item assignment error. Adding return metric.compute(predictions=decoded_preds, references=decoded_labels) to compute_metrics solve the problem for me.

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    # In case the model returns more than the prediction logits
    if isinstance(preds, tuple):
        preds = preds[0]

    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    # Replace -100s in the labels as we can't decode them
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Some simple post-processing
    decoded_preds = [pred.strip() for pred in decoded_preds]
    decoded_labels = [[label.strip()] for label in decoded_labels]
    
    return metric.compute(predictions=decoded_preds, references=decoded_labels)

lewtun · February 18, 2022, 3:55pm

Thanks for catching this bug @PyaePK ! I’ll post a fix to the website and notebooks

Topic		Replies	Views
Chapter 3 questions Course	145	10350	July 15, 2025
Fine Tuning IMDb tutorial - Unable to reproduce and adapt Beginners	19	8597	August 21, 2020
Tutorial: Fine-tuning with custom datasets – sentiment, NER, and question answering 🤗Transformers	19	12857	February 12, 2024
Chapter 1 questions Course	107	24806	May 28, 2025
Dataset.transform() hangs indefinitely while finetuning the stable diffusion XL 🤗Transformers	3	8034	January 27, 2024

Chapter 7 questions

***** Running Evaluation ***** Num examples = 21018 Batch size = 64 [329/329 21:25]

Related topics

* Running Evaluation *
Num examples = 21018
Batch size = 64
[329/329 21:25]