Chapter 3 questions

I was wondering can we finetune a model with Jax, just like we fine tuned with pytorch? here

I couldn’t find any guide for that, how to approach this?
any suggestions would be great!
I was trying to fine tune this VQ GAN model, which has been pretrained using Jax.
gently pinging @lewtun and @sgugger for suggestion on this. Thanks :slight_smile:

hey @khalidsaifullaah here’s a tutorial for doing text classification fir JAX/Flax: notebooks/text_classification_flax.ipynb at master · huggingface/notebooks · GitHub

in that repo there’s also Flax examples for language modelling which might be closer to what you need for CLIP :slight_smile:

1 Like

Ah! Thank you very much

1 Like

Is there any notebook available for finetuning a GPT2 model on a text-generation (poem/song/etc.) based task?
We were hoping to finetune our pretrained GPT2 Bengali model, any pointer would help! Thanks

This will be covered in chapter 7. In the meantime, you can look at the language modeling scripts and notebooks to see how to fine-tune a language model on a new corpus.

1 Like

Thanks a lot @sgugger
Eagerly looking forward to the future lessons! :heart:

Hi there! I have to say, the course is amazing. It explains everything on a high-level enough basis so you can understand all the steps perfectly without having to dive deeper into what’s under the hood. I do have a question: how would you use a model like the one you have fine-tuned here in a pipeline? Seems like the text classification pipeline only accepts one sentence (or a list of single sentences).

great question @andy13771 !

in general i think you can use the [SEP] token in your inputs to tell the pipeline which part belongs to sentence 1 and sentence 2. this token will differ from tokenizer to tokenizer, but usually [SEP] works for BERT-based models while other models like RoBERTa use </s>

@lewtun Well I’m trying to do that with a model I have finetuned. It has somewhere around 95% accuracy. I’m taking 20 test samples and feeding them to the pipeline with a ‘[SEP]’ in between the 2 sentences and it seems to always predict label 0.
I also tried doing that in in the same colab notebook from the course and it does the same thing (except it always predicts 1 but that’s not the point).
You can see it here:

thanks for sharing your notebook @andy13771 - that really helps!

i now think i was incorrect about simply using [SEP] in the pipeline for BERT-based models with sentence-pair tasks like mrpc :grimacing:

the problem is that BERT’s tokenizer relies on token_type_ids to keep track of which tokens belong to the first / second sentence, and with just a single string input like

"sentence 1 [SEP] sentence 2"

it assigns a 0 ID to each token. (you can verify this for yourself by passing two sentences to a BERT tokenizer and comparing the token_type_ids vs those with a single string)

so it seems that for BERT models, we can’t hack the pipeline for sentence-pair tasks, however there are other models like RoBERTa which don’t rely on token_type_ids at all! for these models, the separation token is <\s><\s> so the following example shows we get the correct prediction for the first training example of the MRPC dataset: textattack/roberta-base-MRPC · Hugging Face.

similarly, models like DistilBERT don’t rely on token_type_ids so for these you can use the [SEP] token trick: textattack/distilbert-base-cased-MRPC · Hugging Face.

hth!

1 Like

You can use a text classification pipeline for pairs of sentences, though it’s a bit obscure :slight_smile:
The key is to pass a list of pairs of sentences to the pipeline object, taking your example in Colab:

cls = pipeline('text-classification', model='testing-pipeline')
for i in range(20):
  print(cls([[raw_datasets['test'][i]['sentence1'], raw_datasets['test'][i]['sentence2']]]))

(here double brackets to have a list with one pair of sentences).

2 Likes

@lewtun yep, that makes sense, forgot about token_type_ids, thanks

@sgugger cool, I’ll try that, thanks

1 Like

In the notebook for Chapter 3, we have the example for predictions for a batch
predictions = trainer.predict(tokenized_datasets["validation"])

If I want to make a prediction for just a single sentence pair using my model
s1='Today is a sunny day'
s2='Yesterday was a sunny day'

How would I do this, please?

Many thanks

Just put it in a list after preprocessing it. Something like

predictions = trainer.predict([tokenizer(s1, s2, return_tensors="pt")])

should work

1 Like

Thanks, @sgugger.

In the example in chapter 3 we use
trainer.predict(tokenized_datasets["validation"])

I can’t figure out how to get s1 and s2 into a format (i.e. the preprocessing) that would allow me to get what you have suggested.

  • I think it means that I need to have input_ids and attention_masks etc, but I can’t figure out how to get there as you did with the map(tokenize_function).

Any help gratefully received. Many thanks!

Can anyone help answer my question above please?

Many thanks.

In the “A full training” section, we talk about using the Data Loader to break the data into batches, which then gets sent to the model like so:

outputs = model(**batch)

I’m wanting to change the default loss function that the model uses when comparing predictions to ground truth labels. How can I specify a different loss function from the default loss function in the below training loop code?

from tqdm.auto import tqdm
progress_bar = tqdm(range(num_training_steps))
model.train()
for epoch in range(num_epochs):
    for batch in train_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()

        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

You can use the outputs.logits instead of outputs.loss, and pass them through the loss function of your choice, along with batch["labels"].

Similarly to others, I had the same issue due to the missing line in the update_state of the F1_metric class. The updated code fixed it, but I’ve tried to add tf.keras.metrics.Precision() to metrics in the compile method and got the same error. I’ve basically adapted your code and it works, but I wonder why tf.keras.metrics.Precision() as is doesn’t work. I can see how the first F1_metric didn’t work (I suppose we were comparing the true class (shape None,1) vs class probabilities (shape None,2)) but I would have expected tf.keras.metrics.Precision() to handle that automatically. Does it not?

class Precision_metric(tf.keras.metrics.Metric):
    def __init__(self, name='precision', **kwargs):
        super().__init__(name=name, **kwargs)
        self.precision = tf.keras.metrics.Precision()

    def update_state(self, y_true, y_pred, sample_weight=None):
        class_preds = tf.math.argmax(y_pred, axis=1)
        self.precision.update_state(y_true, class_preds, sample_weight)

    def reset_state(self):
        self.precision.reset_state()

    def result(self):
        return self.precision.result()

Sorry for the delay in replying! We’re actually pulling that section from the updated course - it was quite confusing, and it wasn’t really much help, since you could only use that approach to compute the F1 metric, and not more complex NLP metrics like BLEU and ROUGE. Instead, we’re working on a new Keras callback for automatically computing more arbitrary metrics, which should hopefully be both simpler and much more useful than hacking in the F1 as a Keras metric!

1 Like