How to integrate an AzureMLCallback for logging in Azure?

davidefiocco · October 23, 2020, 4:03pm

Hi!

I saw that @sgugger recently refactored the way in which transformers integrates with tools to visualize logs in a more helpful way: https://github.com/huggingface/transformers/pull/7596

As I am running in Azure and using AzureML, I was trying to see if I could do something similar.
Prior to the PR above, I could add a pair of very simple snippets that allowed to send information to Azure via azureml.core.Run class - Azure Machine Learning Python | Microsoft Learn

I tried to replicate the above with the new approach, but I may be missing something obvious.
I created a new callback class in integrations.py

class AzureMLCallback(TrainerCallback):

    def __init__(self, azureml_run=None):
        assert (
            _has_azureml
        ), "AzureMLCallback requires azureml to be installed. Run `pip install azureml-sdk`."
        self.azureml_run = azureml_run

    def on_init_end(self, args, state, control, **kwargs):
        if self.azureml_run is None and state.is_world_process_zero:
            self.azureml_run = Run.get_context()

    def on_log(self, args, logs=None, **kwargs):
        if self.azureml_run:
            for k, v in logs.items():
                if isinstance(v, (int, float)):
                    self.azureml_run.log(k, v, description=k)

and did another bunch of other minor changes.

Upon installing on a machine my fork of the library with

pip install git+https://github.com/davidefiocco/transformers.git@c32718170899d1110a77ab116a2a60bbe326829e --quiet

when running

python run_glue.py --model_name_or_path bert-base-cased \
                    --task_name CoLA \
                    --do_train \
                    --do_eval \
                    --train_file ./glue_data/CoLA/train.tsv \
                    --validation_file ./glue_data/CoLA/dev.tsv \
                    --max_seq_length 128 \
                    --per_device_train_batch_size 32 \
                    --learning_rate 2e-5 \
                    --num_train_epochs 3.0 \
                    --output_dir output \
                    --evaluation_strategy steps \
                    --logging_steps 8 \
                    --eval_steps 4

I get the error:

Traceback (most recent call last):
File “run_glue.py”, line 417, in
main()
File “run_glue.py”, line 352, in main
model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 792, in train
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 853, in _maybe_log_save_evaluate
metrics = self.evaluate()
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 1291, in evaluate
self.log(output.metrics)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 1044, in log
self.control = self.callback_handler.on_log(self.args, self.state, self.control, logs)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer_callback.py”, line 366, in on_log
return self.call_event(“on_log”, args, state, control, logs=logs)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer_callback.py”, line 382, in call_event
**kwargs,
TypeError: on_log() got multiple values for argument ‘logs’

So there’s likely something wrong in my AzureMLCallback… can someone help me spot the issue?

If you wish to replicate the behavior you can use this notebook Google Colab while the source code is https://github.com/davidefiocco/transformers/tree/c32718170899d1110a77ab116a2a60bbe326829e

sgugger · October 23, 2020, 4:29pm

Hi there! Glad to see you try the new callbacks! The mistake is that you did not leave state and control which are positional arguments. Just replace you on_log definition by:

def on_log((self, args, state, control, logs=None, **kwargs):

and you’ll be fine!

davidefiocco · October 23, 2020, 4:47pm

Indeed! Aw, I don’t know why I messed with the function signature when copying the available examples!

Thanks for spotting that and for the new tricks!

julien-c · October 25, 2020, 8:34am

might be interesting to add this snippet to https://github.com/huggingface/transformers/blob/master/src/transformers/integrations.py @davidefiocco

davidefiocco · October 26, 2020, 8:22am

Cool @julien-c ! I will review a couple of things and aim to send a PR to you by this week.

Topic		Replies	Views
Logging finetuned model using transformers mlflow flavor in azure Intermediate	5	74	March 10, 2025
The TrainerState's log_history is always empty when using a custom callback Beginners	1	277	July 10, 2024
MLflowCallback TypeError: can only concatenate list (not "type") to list Intermediate	3	1646	November 16, 2021
Seq2Seq-Example does not work on Azure 🤗Transformers	2	809	January 9, 2021
Callbacks for logging results to GPT2 🤗Transformers	1	462	February 16, 2022

How to integrate an AzureMLCallback for logging in Azure?

Related topics