Use this topic for any question about Chapter 3 of the course.
You can look at the official examples:
- there is the
run_translation
script - and the Translation notebook, that you can open in colab.
The course will cover this and other tasks in section 2.
AutoModelForSeq2SeqLM
is for encoder-decoder models like T5, BART, mBART, PEGASUS. You can for example initialize an AutoModelForSeq2SeqLM
from t5-base
, like so:
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
EncoderDecoderMode
l is different, this was created to use encoder-only or decoder-only models in an encoder-decoder setup (this is possible, as was shown in this paper). For example, BERT is an encoder-only model, but you can use it in an encoder-decoder set-up as follows:
from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel
# Initializing a BERT bert-base-uncased style configuration
config_encoder = BertConfig()
config_decoder = BertConfig()
config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
# Initializing a Bert2Bert model from the bert-base-uncased style configurations
model = EncoderDecoderModel(config=config)
You can also for example use BERT as encoder and GPT-2 as decoder.
I got the error below after running the last block of code on the ‘Fine-tuning a model with Keras’ notebooklink
tValueError Traceback (most recent call last)
<ipython-input-14-d95f2fa5c30c> in <module>()
11 np.array(raw_datasets['train']['label']),
12 validation_data=(tokenized_datasets['validation'], np.array(raw_datasets['validation']['label'])),
---> 13 batch_size=8,
14 )
9 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1181 _r=1):
1182 callbacks.on_train_batch_begin(step)
-> 1183 tmp_logs = self.train_function(iterator)
1184 if data_handler.should_sync:
1185 context.async_wait()
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
887
888 with OptionalXlaContext(self._jit_compile):
--> 889 result = self._call(*args, **kwds)
890
891 new_tracing_count = self.experimental_get_tracing_count()
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
931 # This is the first call of __call__, so we have to initialize.
932 initializers = []
--> 933 self._initialize(args, kwds, add_initializers_to=initializers)
934 finally:
935 # At this point we know that the initialization is complete (or less
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
762 self._concrete_stateful_fn = (
763 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
--> 764 *args, **kwds))
765
766 def invalid_creator_scope(*unused_args, **unused_kwds):
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
3048 args, kwargs = None, None
3049 with self._lock:
-> 3050 graph_function, _ = self._maybe_define_function(args, kwargs)
3051 return graph_function
3052
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
3442
3443 self._function_cache.missed.add(call_context_key)
-> 3444 graph_function = self._create_graph_function(args, kwargs)
3445 self._function_cache.primary[cache_key] = graph_function
3446
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
3287 arg_names=arg_names,
3288 override_flat_arg_shapes=override_flat_arg_shapes,
-> 3289 capture_by_value=self._capture_by_value),
3290 self._function_attributes,
3291 function_spec=self.function_spec,
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
997 _, original_func = tf_decorator.unwrap(python_func)
998
--> 999 func_outputs = python_func(*func_args, **func_kwargs)
1000
1001 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
670 # the function a weak reference to itself to avoid a reference cycle.
671 with OptionalXlaContext(compile_with_xla):
--> 672 out = weak_wrapped_fn().__wrapped__(*args, **kwds)
673 return out
674
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
984 except Exception as e: # pylint:disable=broad-except
985 if hasattr(e, "ag_error_metadata"):
--> 986 raise e.ag_error_metadata.to_exception(e)
987 else:
988 raise
ValueError: in user code:
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:855 train_function *
return step_function(self, iterator)
<ipython-input-12-112004023d94>:11 update_state *
self.precision.update_state(y_true, y_pred, sample_weight)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:86 decorated **
update_op = update_state_fn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:177 update_state_fn
return ag_update_state(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:1337 update_state **
sample_weight=sample_weight)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:366 update_confusion_matrix_variables
y_pred.shape.assert_is_compatible_with(y_true.shape)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor_shape.py:1161 assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (None, 2) and (None, 1) are incompatible
Pinging @Rocketknight1
Hi, I’ve reproduced it - it’s an issue caused by an older version of the F1 metric code being included in the final notebook by accident. My fault, I’m going to quickly test and push a fix now.
@Photons thank you for catching that bug!
For the code example in this chapter, using the Trainer API in colab runs on GPU but the same script in own python virtual env trains on CPU.
This isn’t a problem however with the full training script using native pytorch (same python virtual env) where we explicitly define the device and transfer model and batch to device.
Can you provide some suggestions on how to diagnose the non-GPU detection using trainer API? Thanks.
The Trainer
checks torch.cuda.is_available()
behind the scenes so you should just inspect the result of that function.
Hi Matt,
How does the fix manifest? Today I’m having the exact same problem as Photons above. I’m replicating all of your code in my own notebook, and it’s not apparent to me what I need to update to bring in your fix. Thank you!
-Mike
Hi @feynmanm , I’ve linked the updated F1 metric code below, which should be in the Colab notebook now. Can you confirm your Metric class looks like this (the update_state
method has been changed), and if you’re using this version but still getting errors can you paste the traceback you get?
class F1_metric(tf.keras.metrics.Metric):
def __init__(self, name='f1_score', **kwargs):
super().__init__(name=name, **kwargs)
# Initialize our metric by initializing the two metrics it's based on:
# Precision and Recall
self.precision = tf.keras.metrics.Precision()
self.recall = tf.keras.metrics.Recall()
def update_state(self, y_true, y_pred, sample_weight=None):
# Update our metric by updating the two metrics it's based on
class_preds = tf.math.argmax(y_pred, axis=1)
self.precision.update_state(y_true, class_preds, sample_weight)
self.recall.update_state(y_true, class_preds, sample_weight)
def reset_state(self):
self.precision.reset_state()
self.recall.reset_state()
def result(self):
# To get the F1 result, we compute the harmonic mean of the current
# precision and recall
return 2 / ((1 / self.precision.result()) + (1 / self.recall.result()))
Updated update_state fixed the error, thanks!
Chapter 3: A full training
I’m trying to run the notebook_launcher(training_function)
in the colab TPU. But getting lots of error, mostly dependency related. At first, No module named 'accelerate'
, then I did pip install accelerate
. Now it’s giving me No module named 'torch_xla'
…
When one uses a transformers model for a particular task like sequence classification for instance, is there any way to performing something like the fine-tuning of the language model for the downstream task before doing sequence classification like in the case of models like ULMFiT? Are there any benefits of doing that for transformer models?
hey @khalidsaifullaah thanks for reporting the bug! i’ll push a fix later today
hey @harish3110 this is a great question!
indeed, one can always fine-tune the pretrained language model on your corpus before performing the supervised task (e.g. text classification).
in my experience, the gains in accuracy are strongly dependent on how different your corpus is from the one used during pretraining, e.g. if your corpus is something quirky like source code then fine-tuning the language model first can be quite beneficial since the target domain is quite different from say wikipedia.
having said that, what i usually do in practice is run a few supervised fine-tuning experiments first and then think about where fine-tuning the language model is needed (since it takes longer to train).
hth!
@lewtun Ah! Makes sense intuitively to do it but I haven’t seen it done in practice much. Never come across a Kaggle kernel that did that. Do you have any resources or links that shows how one can fine-tune with the HuggingFace library by any chance?
hey @harish3110 yes this kind of task is definitely something that comes up more in industry than research which is probably why you don’t see many examples online (except of course fast.ai which is often a few years ahead of the curve on best practices!)
you can check out the language modelling tutorial here which you can adapt for fine-tuning on your corpus / domain.
Hey,
I have a question regarding Chapter 3.3: Fine-tuning a model with the Trainer API:
Why was the prediction made on the validation set and not on an additional test set which wasn’t used during training?
hey @realjanpaulus, we did this for simplicity but can also use the Trainer.evaluate
function to get the predictions on the test set quite easily:
test_dataset = ...
test_preds = trainer.evaluate(test_dataset)
Ah thank you