Chapter 3 questions

khalidsaifullaah · July 6, 2021, 9:40pm

I was wondering can we finetune a model with Jax, just like we fine tuned with pytorch? here

I couldn’t find any guide for that, how to approach this?
any suggestions would be great!
I was trying to fine tune this VQ GAN model, which has been pretrained using Jax.
gently pinging @lewtun and @sgugger for suggestion on this. Thanks

lewtun · July 7, 2021, 7:41am

hey @khalidsaifullaah here’s a tutorial for doing text classification fir JAX/Flax: notebooks/text_classification_flax.ipynb at master · huggingface/notebooks · GitHub

in that repo there’s also Flax examples for language modelling which might be closer to what you need for CLIP

khalidsaifullaah · July 7, 2021, 10:28am

Ah! Thank you very much

khalidsaifullaah · July 15, 2021, 11:33pm

Is there any notebook available for finetuning a GPT2 model on a text-generation (poem/song/etc.) based task?
We were hoping to finetune our pretrained GPT2 Bengali model, any pointer would help! Thanks

sgugger · July 16, 2021, 10:14am

This will be covered in chapter 7. In the meantime, you can look at the language modeling scripts and notebooks to see how to fine-tune a language model on a new corpus.

khalidsaifullaah · July 16, 2021, 10:22am

Thanks a lot @sgugger
Eagerly looking forward to the future lessons!

andy13771 · July 19, 2021, 3:00pm

Hi there! I have to say, the course is amazing. It explains everything on a high-level enough basis so you can understand all the steps perfectly without having to dive deeper into what’s under the hood. I do have a question: how would you use a model like the one you have fine-tuned here in a pipeline? Seems like the text classification pipeline only accepts one sentence (or a list of single sentences).

lewtun · July 19, 2021, 3:26pm

great question @andy13771 !

in general i think you can use the [SEP] token in your inputs to tell the pipeline which part belongs to sentence 1 and sentence 2. this token will differ from tokenizer to tokenizer, but usually [SEP] works for BERT-based models while other models like RoBERTa use </s>

andy13771 · July 19, 2021, 4:07pm

@lewtun Well I’m trying to do that with a model I have finetuned. It has somewhere around 95% accuracy. I’m taking 20 test samples and feeding them to the pipeline with a ‘[SEP]’ in between the 2 sentences and it seems to always predict label 0.
I also tried doing that in in the same colab notebook from the course and it does the same thing (except it always predicts 1 but that’s not the point).
You can see it here:

lewtun · July 19, 2021, 9:06pm

thanks for sharing your notebook @andy13771 - that really helps!

i now think i was incorrect about simply using [SEP] in the pipeline for BERT-based models with sentence-pair tasks like mrpc

the problem is that BERT’s tokenizer relies on token_type_ids to keep track of which tokens belong to the first / second sentence, and with just a single string input like

"sentence 1 [SEP] sentence 2"

it assigns a 0 ID to each token. (you can verify this for yourself by passing two sentences to a BERT tokenizer and comparing the token_type_ids vs those with a single string)

so it seems that for BERT models, we can’t hack the pipeline for sentence-pair tasks, however there are other models like RoBERTa which don’t rely on token_type_ids at all! for these models, the separation token is <\s><\s> so the following example shows we get the correct prediction for the first training example of the MRPC dataset: textattack/roberta-base-MRPC · Hugging Face.

similarly, models like DistilBERT don’t rely on token_type_ids so for these you can use the [SEP] token trick: textattack/distilbert-base-cased-MRPC · Hugging Face.

hth!

sgugger · July 20, 2021, 8:20am

You can use a text classification pipeline for pairs of sentences, though it’s a bit obscure
The key is to pass a list of pairs of sentences to the pipeline object, taking your example in Colab:

cls = pipeline('text-classification', model='testing-pipeline')
for i in range(20):
  print(cls([[raw_datasets['test'][i]['sentence1'], raw_datasets['test'][i]['sentence2']]]))

(here double brackets to have a list with one pair of sentences).

andy13771 · July 20, 2021, 8:41am

@lewtun yep, that makes sense, forgot about token_type_ids, thanks

@sgugger cool, I’ll try that, thanks

jon-fernandes · August 11, 2021, 5:48am

In the notebook for Chapter 3, we have the example for predictions for a batch
predictions = trainer.predict(tokenized_datasets["validation"])

If I want to make a prediction for just a single sentence pair using my model
s1='Today is a sunny day'
s2='Yesterday was a sunny day'

How would I do this, please?

Many thanks

sgugger · August 11, 2021, 6:49am

Just put it in a list after preprocessing it. Something like

predictions = trainer.predict([tokenizer(s1, s2, return_tensors="pt")])

should work

jon-fernandes · August 11, 2021, 10:41am

Thanks, @sgugger.

In the example in chapter 3 we use
trainer.predict(tokenized_datasets["validation"])

I can’t figure out how to get s1 and s2 into a format (i.e. the preprocessing) that would allow me to get what you have suggested.

I think it means that I need to have input_ids and attention_masks etc, but I can’t figure out how to get there as you did with the map(tokenize_function).

Any help gratefully received. Many thanks!

jon-fernandes · August 13, 2021, 8:39am

Can anyone help answer my question above please?

Many thanks.

lthistlethwaite · August 17, 2021, 6:15pm

In the “A full training” section, we talk about using the Data Loader to break the data into batches, which then gets sent to the model like so:

outputs = model(**batch)

I’m wanting to change the default loss function that the model uses when comparing predictions to ground truth labels. How can I specify a different loss function from the default loss function in the below training loop code?

from tqdm.auto import tqdm
progress_bar = tqdm(range(num_training_steps))
model.train()
for epoch in range(num_epochs):
    for batch in train_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()

        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

sgugger · August 30, 2021, 6:03pm

You can use the outputs.logits instead of outputs.loss, and pass them through the loss function of your choice, along with batch["labels"].

federicotrifoglio · October 27, 2021, 8:17pm

Similarly to others, I had the same issue due to the missing line in the update_state of the F1_metric class. The updated code fixed it, but I’ve tried to add tf.keras.metrics.Precision() to metrics in the compile method and got the same error. I’ve basically adapted your code and it works, but I wonder why tf.keras.metrics.Precision() as is doesn’t work. I can see how the first F1_metric didn’t work (I suppose we were comparing the true class (shape None,1) vs class probabilities (shape None,2)) but I would have expected tf.keras.metrics.Precision() to handle that automatically. Does it not?

class Precision_metric(tf.keras.metrics.Metric):
    def __init__(self, name='precision', **kwargs):
        super().__init__(name=name, **kwargs)
        self.precision = tf.keras.metrics.Precision()

    def update_state(self, y_true, y_pred, sample_weight=None):
        class_preds = tf.math.argmax(y_pred, axis=1)
        self.precision.update_state(y_true, class_preds, sample_weight)

    def reset_state(self):
        self.precision.reset_state()

    def result(self):
        return self.precision.result()

Rocketknight1 · November 4, 2021, 7:08pm

Sorry for the delay in replying! We’re actually pulling that section from the updated course - it was quite confusing, and it wasn’t really much help, since you could only use that approach to compute the F1 metric, and not more complex NLP metrics like BLEU and ROUGE. Instead, we’re working on a new Keras callback for automatically computing more arbitrary metrics, which should hopefully be both simpler and much more useful than hacking in the F1 as a Keras metric!

Topic		Replies	Views
Implementation source code for AutoModelForSeq2SeqLM Beginners	0	977	January 5, 2022
BART from finetuned BERT Intermediate	2	472	September 9, 2021
EnocederDecoder training/prediction with two tokenizers Beginners	1	779	October 22, 2024
How create BERT2Rand Encoder-Decoder model Models	2	1089	March 16, 2021
How to train an EncoderDecoderModel with different pretrained encoder and decoder? 🤗Transformers	2	418	April 2, 2024

Chapter 3 questions

Related topics