Chapter 3 questions

sgugger · June 14, 2021, 2:52pm

Use this topic for any question about Chapter 3 of the course.

sgugger · June 17, 2021, 11:34am

You can look at the official examples:

there is the run_translation script
and the Translation notebook, that you can open in colab.

The course will cover this and other tasks in section 2.

nielsr · June 17, 2021, 3:18pm

AutoModelForSeq2SeqLM is for encoder-decoder models like T5, BART, mBART, PEGASUS. You can for example initialize an AutoModelForSeq2SeqLM from t5-base, like so:

from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")

EncoderDecoderModel is different, this was created to use encoder-only or decoder-only models in an encoder-decoder setup (this is possible, as was shown in this paper). For example, BERT is an encoder-only model, but you can use it in an encoder-decoder set-up as follows:

from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel

# Initializing a BERT bert-base-uncased style configuration
config_encoder = BertConfig()
config_decoder = BertConfig()

config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)

# Initializing a Bert2Bert model from the bert-base-uncased style configurations
model = EncoderDecoderModel(config=config)

You can also for example use BERT as encoder and GPT-2 as decoder.

Photons · June 21, 2021, 5:57am

I got the error below after running the last block of code on the ‘Fine-tuning a model with Keras’ notebooklink

tValueError                                Traceback (most recent call last)
<ipython-input-14-d95f2fa5c30c> in <module>()
     11     np.array(raw_datasets['train']['label']),
     12     validation_data=(tokenized_datasets['validation'], np.array(raw_datasets['validation']['label'])),
---> 13     batch_size=8,
     14 )

9 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1181                 _r=1):
   1182               callbacks.on_train_batch_begin(step)
-> 1183               tmp_logs = self.train_function(iterator)
   1184               if data_handler.should_sync:
   1185                 context.async_wait()

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    887 
    888       with OptionalXlaContext(self._jit_compile):
--> 889         result = self._call(*args, **kwds)
    890 
    891       new_tracing_count = self.experimental_get_tracing_count()

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    931       # This is the first call of __call__, so we have to initialize.
    932       initializers = []
--> 933       self._initialize(args, kwds, add_initializers_to=initializers)
    934     finally:
    935       # At this point we know that the initialization is complete (or less

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
    762     self._concrete_stateful_fn = (
    763         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
--> 764             *args, **kwds))
    765 
    766     def invalid_creator_scope(*unused_args, **unused_kwds):

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   3048       args, kwargs = None, None
   3049     with self._lock:
-> 3050       graph_function, _ = self._maybe_define_function(args, kwargs)
   3051     return graph_function
   3052 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   3442 
   3443           self._function_cache.missed.add(call_context_key)
-> 3444           graph_function = self._create_graph_function(args, kwargs)
   3445           self._function_cache.primary[cache_key] = graph_function
   3446 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3287             arg_names=arg_names,
   3288             override_flat_arg_shapes=override_flat_arg_shapes,
-> 3289             capture_by_value=self._capture_by_value),
   3290         self._function_attributes,
   3291         function_spec=self.function_spec,

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    997         _, original_func = tf_decorator.unwrap(python_func)
    998 
--> 999       func_outputs = python_func(*func_args, **func_kwargs)
   1000 
   1001       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
    670         # the function a weak reference to itself to avoid a reference cycle.
    671         with OptionalXlaContext(compile_with_xla):
--> 672           out = weak_wrapped_fn().__wrapped__(*args, **kwds)
    673         return out
    674 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    984           except Exception as e:  # pylint:disable=broad-except
    985             if hasattr(e, "ag_error_metadata"):
--> 986               raise e.ag_error_metadata.to_exception(e)
    987             else:
    988               raise

ValueError: in user code:

    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:855 train_function  *
        return step_function(self, iterator)
    <ipython-input-12-112004023d94>:11 update_state  *
        self.precision.update_state(y_true, y_pred, sample_weight)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:86 decorated  **
        update_op = update_state_fn(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:177 update_state_fn
        return ag_update_state(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:1337 update_state  **
        sample_weight=sample_weight)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:366 update_confusion_matrix_variables
        y_pred.shape.assert_is_compatible_with(y_true.shape)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor_shape.py:1161 assert_is_compatible_with
        raise ValueError("Shapes %s and %s are incompatible" % (self, other))

    ValueError: Shapes (None, 2) and (None, 1) are incompatible

sgugger · June 21, 2021, 12:08pm

Pinging @Rocketknight1

Rocketknight1 · June 21, 2021, 1:52pm

Hi, I’ve reproduced it - it’s an issue caused by an older version of the F1 metric code being included in the final notebook by accident. My fault, I’m going to quickly test and push a fix now.

@Photons thank you for catching that bug!

twistedstats · June 22, 2021, 12:27am

For the code example in this chapter, using the Trainer API in colab runs on GPU but the same script in own python virtual env trains on CPU.

This isn’t a problem however with the full training script using native pytorch (same python virtual env) where we explicitly define the device and transfer model and batch to device.

Can you provide some suggestions on how to diagnose the non-GPU detection using trainer API? Thanks.

sgugger · June 22, 2021, 12:06pm

The Trainer checks torch.cuda.is_available() behind the scenes so you should just inspect the result of that function.

feynmanm · June 23, 2021, 2:06pm

Hi Matt,

How does the fix manifest? Today I’m having the exact same problem as Photons above. I’m replicating all of your code in my own notebook, and it’s not apparent to me what I need to update to bring in your fix. Thank you!

-Mike

Rocketknight1 · June 23, 2021, 3:12pm

Hi @feynmanm , I’ve linked the updated F1 metric code below, which should be in the Colab notebook now. Can you confirm your Metric class looks like this (the update_state method has been changed), and if you’re using this version but still getting errors can you paste the traceback you get?

class F1_metric(tf.keras.metrics.Metric):
    def __init__(self, name='f1_score', **kwargs):
        super().__init__(name=name, **kwargs)
        # Initialize our metric by initializing the two metrics it's based on:
        # Precision and Recall
        self.precision = tf.keras.metrics.Precision()
        self.recall = tf.keras.metrics.Recall()

    def update_state(self, y_true, y_pred, sample_weight=None):
        # Update our metric by updating the two metrics it's based on
        class_preds = tf.math.argmax(y_pred, axis=1)
        self.precision.update_state(y_true, class_preds, sample_weight)
        self.recall.update_state(y_true, class_preds, sample_weight)

    def reset_state(self):
        self.precision.reset_state()
        self.recall.reset_state()

    def result(self):
        # To get the F1 result, we compute the harmonic mean of the current
        # precision and recall
        return 2 / ((1 / self.precision.result()) + (1 / self.recall.result()))

feynmanm · June 23, 2021, 3:31pm

Updated update_state fixed the error, thanks!

khalidsaifullaah · June 26, 2021, 9:24pm

Chapter 3: A full training

I’m trying to run the notebook_launcher(training_function) in the colab TPU. But getting lots of error, mostly dependency related. At first, No module named 'accelerate', then I did pip install accelerate. Now it’s giving me No module named 'torch_xla'…

harish3110 · June 27, 2021, 9:15am

When one uses a transformers model for a particular task like sequence classification for instance, is there any way to performing something like the fine-tuning of the language model for the downstream task before doing sequence classification like in the case of models like ULMFiT? Are there any benefits of doing that for transformer models?

lewtun · June 28, 2021, 9:18am

hey @khalidsaifullaah thanks for reporting the bug! i’ll push a fix later today

lewtun · June 28, 2021, 9:36am

hey @harish3110 this is a great question!

indeed, one can always fine-tune the pretrained language model on your corpus before performing the supervised task (e.g. text classification).

in my experience, the gains in accuracy are strongly dependent on how different your corpus is from the one used during pretraining, e.g. if your corpus is something quirky like source code then fine-tuning the language model first can be quite beneficial since the target domain is quite different from say wikipedia.

having said that, what i usually do in practice is run a few supervised fine-tuning experiments first and then think about where fine-tuning the language model is needed (since it takes longer to train).

hth!

harish3110 · June 29, 2021, 6:56pm

@lewtun Ah! Makes sense intuitively to do it but I haven’t seen it done in practice much. Never come across a Kaggle kernel that did that. Do you have any resources or links that shows how one can fine-tune with the HuggingFace library by any chance?

lewtun · June 30, 2021, 12:49pm

hey @harish3110 yes this kind of task is definitely something that comes up more in industry than research which is probably why you don’t see many examples online (except of course fast.ai which is often a few years ahead of the curve on best practices!)

you can check out the language modelling tutorial here which you can adapt for fine-tuning on your corpus / domain.

realjanpaulus · July 5, 2021, 12:04pm

Hey,

I have a question regarding Chapter 3.3: Fine-tuning a model with the Trainer API:

Why was the prediction made on the validation set and not on an additional test set which wasn’t used during training?

lewtun · July 5, 2021, 12:31pm

hey @realjanpaulus, we did this for simplicity but can also use the Trainer.evaluate function to get the predictions on the test set quite easily:

test_dataset = ...
test_preds = trainer.evaluate(test_dataset)

realjanpaulus · July 5, 2021, 12:48pm

Ah thank you

Topic		Replies	Views
Chapter 7 questions Course	119	10374	July 10, 2025
Fine Tuning IMDb tutorial - Unable to reproduce and adapt Beginners	19	8598	August 21, 2020
Tutorial: Fine-tuning with custom datasets – sentiment, NER, and question answering 🤗Transformers	19	12871	February 12, 2024
Transformers v3.0.0 is out! 🤗Transformers	0	1938	July 7, 2020
Seq2SeqTrainer: enabled must be a bool (got NoneType) 🤗Transformers	15	3960	December 5, 2022

Chapter 3 questions

Related topics