Chapter 3 questions

Use this topic for any question about Chapter 3 of the course.

Hi @sgugger, thank you for informative course. Do you have any documents or tutorial for fine-tuning a pretrain model for translation task? Thank you very much.

You can look at the official examples:

The course will cover this and other tasks in section 2.


Thank you very much. Can I ask you what is the different between AutoModelForSeq2SeqLM and
EncoderDecoderModel? Or they are just the same but different way to call.

AutoModelForSeq2SeqLM is for encoder-decoder models like T5, BART, mBART, PEGASUS. You can for example initialize an AutoModelForSeq2SeqLM from t5-base, like so:

from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")

EncoderDecoderModel is different, this was created to use encoder-only or decoder-only models in an encoder-decoder setup (this is possible, as was shown in this paper). For example, BERT is an encoder-only model, but you can use it in an encoder-decoder set-up as follows:

from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel

# Initializing a BERT bert-base-uncased style configuration
config_encoder = BertConfig()
config_decoder = BertConfig()

config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)

# Initializing a Bert2Bert model from the bert-base-uncased style configurations
model = EncoderDecoderModel(config=config)

You can also for example use BERT as encoder and GPT-2 as decoder.


Thank you very much. I will try both of them

I got the error below after running the last block of code on the ‘Fine-tuning a model with Keras’ notebooklink

tValueError                                Traceback (most recent call last)
<ipython-input-14-d95f2fa5c30c> in <module>()
     11     np.array(raw_datasets['train']['label']),
     12     validation_data=(tokenized_datasets['validation'], np.array(raw_datasets['validation']['label'])),
---> 13     batch_size=8,
     14 )

9 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/ in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1181                 _r=1):
   1182               callbacks.on_train_batch_begin(step)
-> 1183               tmp_logs = self.train_function(iterator)
   1184               if data_handler.should_sync:
   1185                 context.async_wait()

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/ in __call__(self, *args, **kwds)
    888       with OptionalXlaContext(self._jit_compile):
--> 889         result = self._call(*args, **kwds)
    891       new_tracing_count = self.experimental_get_tracing_count()

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/ in _call(self, *args, **kwds)
    931       # This is the first call of __call__, so we have to initialize.
    932       initializers = []
--> 933       self._initialize(args, kwds, add_initializers_to=initializers)
    934     finally:
    935       # At this point we know that the initialization is complete (or less

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/ in _initialize(self, args, kwds, add_initializers_to)
    762     self._concrete_stateful_fn = (
    763         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
--> 764             *args, **kwds))
    766     def invalid_creator_scope(*unused_args, **unused_kwds):

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/ in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   3048       args, kwargs = None, None
   3049     with self._lock:
-> 3050       graph_function, _ = self._maybe_define_function(args, kwargs)
   3051     return graph_function

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/ in _maybe_define_function(self, args, kwargs)
   3443           self._function_cache.missed.add(call_context_key)
-> 3444           graph_function = self._create_graph_function(args, kwargs)
   3445           self._function_cache.primary[cache_key] = graph_function

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/ in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3287             arg_names=arg_names,
   3288             override_flat_arg_shapes=override_flat_arg_shapes,
-> 3289             capture_by_value=self._capture_by_value),
   3290         self._function_attributes,
   3291         function_spec=self.function_spec,

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    997         _, original_func = tf_decorator.unwrap(python_func)
--> 999       func_outputs = python_func(*func_args, **func_kwargs)
   1001       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/ in wrapped_fn(*args, **kwds)
    670         # the function a weak reference to itself to avoid a reference cycle.
    671         with OptionalXlaContext(compile_with_xla):
--> 672           out = weak_wrapped_fn().__wrapped__(*args, **kwds)
    673         return out

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ in wrapper(*args, **kwargs)
    984           except Exception as e:  # pylint:disable=broad-except
    985             if hasattr(e, "ag_error_metadata"):
--> 986               raise e.ag_error_metadata.to_exception(e)
    987             else:
    988               raise

ValueError: in user code:

    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/ train_function  *
        return step_function(self, iterator)
    <ipython-input-12-112004023d94>:11 update_state  *
        self.precision.update_state(y_true, y_pred, sample_weight)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/ decorated  **
        update_op = update_state_fn(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/ update_state_fn
        return ag_update_state(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/ update_state  **
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/ update_confusion_matrix_variables
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ assert_is_compatible_with
        raise ValueError("Shapes %s and %s are incompatible" % (self, other))

    ValueError: Shapes (None, 2) and (None, 1) are incompatible

Pinging @Rocketknight1

Hi, I’ve reproduced it - it’s an issue caused by an older version of the F1 metric code being included in the final notebook by accident. My fault, I’m going to quickly test and push a fix now.

@Photons thank you for catching that bug!


For the code example in this chapter, using the Trainer API in colab runs on GPU but the same script in own python virtual env trains on CPU.

This isn’t a problem however with the full training script using native pytorch (same python virtual env) where we explicitly define the device and transfer model and batch to device.

Can you provide some suggestions on how to diagnose the non-GPU detection using trainer API? Thanks.

The Trainer checks torch.cuda.is_available() behind the scenes so you should just inspect the result of that function.

Hi Matt,

How does the fix manifest? Today I’m having the exact same problem as Photons above. I’m replicating all of your code in my own notebook, and it’s not apparent to me what I need to update to bring in your fix. Thank you!


Hi @feynmanm , I’ve linked the updated F1 metric code below, which should be in the Colab notebook now. Can you confirm your Metric class looks like this (the update_state method has been changed), and if you’re using this version but still getting errors can you paste the traceback you get?

class F1_metric(tf.keras.metrics.Metric):
    def __init__(self, name='f1_score', **kwargs):
        super().__init__(name=name, **kwargs)
        # Initialize our metric by initializing the two metrics it's based on:
        # Precision and Recall
        self.precision = tf.keras.metrics.Precision()
        self.recall = tf.keras.metrics.Recall()

    def update_state(self, y_true, y_pred, sample_weight=None):
        # Update our metric by updating the two metrics it's based on
        class_preds = tf.math.argmax(y_pred, axis=1)
        self.precision.update_state(y_true, class_preds, sample_weight)
        self.recall.update_state(y_true, class_preds, sample_weight)

    def reset_state(self):

    def result(self):
        # To get the F1 result, we compute the harmonic mean of the current
        # precision and recall
        return 2 / ((1 / self.precision.result()) + (1 / self.recall.result())) 

Updated update_state fixed the error, thanks!

Chapter 3: A full training

I’m trying to run the notebook_launcher(training_function) in the colab TPU. But getting lots of error, mostly dependency related. At first, No module named 'accelerate', then I did pip install accelerate. Now it’s giving me No module named 'torch_xla'

When one uses a transformers model for a particular task like sequence classification for instance, is there any way to performing something like the fine-tuning of the language model for the downstream task before doing sequence classification like in the case of models like ULMFiT? Are there any benefits of doing that for transformer models?

hey @khalidsaifullaah thanks for reporting the bug! i’ll push a fix later today :slight_smile:

1 Like

hey @harish3110 this is a great question!

indeed, one can always fine-tune the pretrained language model on your corpus before performing the supervised task (e.g. text classification).

in my experience, the gains in accuracy are strongly dependent on how different your corpus is from the one used during pretraining, e.g. if your corpus is something quirky like source code then fine-tuning the language model first can be quite beneficial since the target domain is quite different from say wikipedia.

having said that, what i usually do in practice is run a few supervised fine-tuning experiments first and then think about where fine-tuning the language model is needed (since it takes longer to train).


1 Like

@lewtun Ah! Makes sense intuitively to do it but I haven’t seen it done in practice much. Never come across a Kaggle kernel that did that. Do you have any resources or links that shows how one can fine-tune with the HuggingFace library by any chance?

hey @harish3110 yes this kind of task is definitely something that comes up more in industry than research which is probably why you don’t see many examples online (except of course which is often a few years ahead of the curve on best practices!)

you can check out the language modelling tutorial here which you can adapt for fine-tuning on your corpus / domain.

1 Like