Use this topic for any question about Chapter 3 of the course.
Hi @sgugger, thank you for informative course. Do you have any documents or tutorial for fine-tuning a pretrain model for translation task? Thank you very much.
You can look at the official examples:
The course will cover this and other tasks in section 2.
Thank you very much. Can I ask you what is the different between AutoModelForSeq2SeqLM and
EncoderDecoderModel? Or they are just the same but different way to call.
AutoModelForSeq2SeqLM is for encoder-decoder models like T5, BART, mBART, PEGASUS. You can for example initialize an
t5-base, like so:
from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
EncoderDecoderModel is different, this was created to use encoder-only or decoder-only models in an encoder-decoder setup (this is possible, as was shown in this paper). For example, BERT is an encoder-only model, but you can use it in an encoder-decoder set-up as follows:
from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel # Initializing a BERT bert-base-uncased style configuration config_encoder = BertConfig() config_decoder = BertConfig() config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder) # Initializing a Bert2Bert model from the bert-base-uncased style configurations model = EncoderDecoderModel(config=config)
You can also for example use BERT as encoder and GPT-2 as decoder.
Thank you very much. I will try both of them
I got the error below after running the last block of code on the ‘Fine-tuning a model with Keras’ notebooklink
tValueError Traceback (most recent call last) <ipython-input-14-d95f2fa5c30c> in <module>() 11 np.array(raw_datasets['train']['label']), 12 validation_data=(tokenized_datasets['validation'], np.array(raw_datasets['validation']['label'])), ---> 13 batch_size=8, 14 ) 9 frames /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing) 1181 _r=1): 1182 callbacks.on_train_batch_begin(step) -> 1183 tmp_logs = self.train_function(iterator) 1184 if data_handler.should_sync: 1185 context.async_wait() /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds) 887 888 with OptionalXlaContext(self._jit_compile): --> 889 result = self._call(*args, **kwds) 890 891 new_tracing_count = self.experimental_get_tracing_count() /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds) 931 # This is the first call of __call__, so we have to initialize. 932 initializers =  --> 933 self._initialize(args, kwds, add_initializers_to=initializers) 934 finally: 935 # At this point we know that the initialization is complete (or less /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to) 762 self._concrete_stateful_fn = ( 763 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access --> 764 *args, **kwds)) 765 766 def invalid_creator_scope(*unused_args, **unused_kwds): /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs) 3048 args, kwargs = None, None 3049 with self._lock: -> 3050 graph_function, _ = self._maybe_define_function(args, kwargs) 3051 return graph_function 3052 /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs) 3442 3443 self._function_cache.missed.add(call_context_key) -> 3444 graph_function = self._create_graph_function(args, kwargs) 3445 self._function_cache.primary[cache_key] = graph_function 3446 /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes) 3287 arg_names=arg_names, 3288 override_flat_arg_shapes=override_flat_arg_shapes, -> 3289 capture_by_value=self._capture_by_value), 3290 self._function_attributes, 3291 function_spec=self.function_spec, /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes) 997 _, original_func = tf_decorator.unwrap(python_func) 998 --> 999 func_outputs = python_func(*func_args, **func_kwargs) 1000 1001 # invariant: `func_outputs` contains only Tensors, CompositeTensors, /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds) 670 # the function a weak reference to itself to avoid a reference cycle. 671 with OptionalXlaContext(compile_with_xla): --> 672 out = weak_wrapped_fn().__wrapped__(*args, **kwds) 673 return out 674 /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs) 984 except Exception as e: # pylint:disable=broad-except 985 if hasattr(e, "ag_error_metadata"): --> 986 raise e.ag_error_metadata.to_exception(e) 987 else: 988 raise ValueError: in user code: /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:855 train_function * return step_function(self, iterator) <ipython-input-12-112004023d94>:11 update_state * self.precision.update_state(y_true, y_pred, sample_weight) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:86 decorated ** update_op = update_state_fn(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:177 update_state_fn return ag_update_state(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:1337 update_state ** sample_weight=sample_weight) /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:366 update_confusion_matrix_variables y_pred.shape.assert_is_compatible_with(y_true.shape) /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor_shape.py:1161 assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (None, 2) and (None, 1) are incompatible
Hi, I’ve reproduced it - it’s an issue caused by an older version of the F1 metric code being included in the final notebook by accident. My fault, I’m going to quickly test and push a fix now.
@Photons thank you for catching that bug!
For the code example in this chapter, using the Trainer API in colab runs on GPU but the same script in own python virtual env trains on CPU.
This isn’t a problem however with the full training script using native pytorch (same python virtual env) where we explicitly define the device and transfer model and batch to device.
Can you provide some suggestions on how to diagnose the non-GPU detection using trainer API? Thanks.
torch.cuda.is_available() behind the scenes so you should just inspect the result of that function.
How does the fix manifest? Today I’m having the exact same problem as Photons above. I’m replicating all of your code in my own notebook, and it’s not apparent to me what I need to update to bring in your fix. Thank you!
Hi @feynmanm , I’ve linked the updated F1 metric code below, which should be in the Colab notebook now. Can you confirm your Metric class looks like this (the
update_state method has been changed), and if you’re using this version but still getting errors can you paste the traceback you get?
class F1_metric(tf.keras.metrics.Metric): def __init__(self, name='f1_score', **kwargs): super().__init__(name=name, **kwargs) # Initialize our metric by initializing the two metrics it's based on: # Precision and Recall self.precision = tf.keras.metrics.Precision() self.recall = tf.keras.metrics.Recall() def update_state(self, y_true, y_pred, sample_weight=None): # Update our metric by updating the two metrics it's based on class_preds = tf.math.argmax(y_pred, axis=1) self.precision.update_state(y_true, class_preds, sample_weight) self.recall.update_state(y_true, class_preds, sample_weight) def reset_state(self): self.precision.reset_state() self.recall.reset_state() def result(self): # To get the F1 result, we compute the harmonic mean of the current # precision and recall return 2 / ((1 / self.precision.result()) + (1 / self.recall.result()))
Updated update_state fixed the error, thanks!
Chapter 3: A full training
I’m trying to run the
notebook_launcher(training_function) in the colab TPU. But getting lots of error, mostly dependency related. At first,
No module named 'accelerate', then I did
pip install accelerate. Now it’s giving me
No module named 'torch_xla'…
When one uses a transformers model for a particular task like sequence classification for instance, is there any way to performing something like the fine-tuning of the language model for the downstream task before doing sequence classification like in the case of models like ULMFiT? Are there any benefits of doing that for transformer models?
hey @khalidsaifullaah thanks for reporting the bug! i’ll push a fix later today
hey @harish3110 this is a great question!
indeed, one can always fine-tune the pretrained language model on your corpus before performing the supervised task (e.g. text classification).
in my experience, the gains in accuracy are strongly dependent on how different your corpus is from the one used during pretraining, e.g. if your corpus is something quirky like source code then fine-tuning the language model first can be quite beneficial since the target domain is quite different from say wikipedia.
having said that, what i usually do in practice is run a few supervised fine-tuning experiments first and then think about where fine-tuning the language model is needed (since it takes longer to train).
@lewtun Ah! Makes sense intuitively to do it but I haven’t seen it done in practice much. Never come across a Kaggle kernel that did that. Do you have any resources or links that shows how one can fine-tune with the HuggingFace library by any chance?
hey @harish3110 yes this kind of task is definitely something that comes up more in industry than research which is probably why you don’t see many examples online (except of course fast.ai which is often a few years ahead of the curve on best practices!)
you can check out the language modelling tutorial here which you can adapt for fine-tuning on your corpus / domain.