How to integrate an AzureMLCallback for logging in Azure?

Hi!

I saw that @sgugger recently refactored the way in which transformers integrates with tools to visualize logs in a more helpful way: https://github.com/huggingface/transformers/pull/7596

As I am running in Azure and using AzureML, I was trying to see if I could do something similar.
Prior to the PR above, I could add a pair of very simple snippets that allowed to send information to Azure via https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run(class)?view=azure-ml-py#log-name--value--description----

I tried to replicate the above with the new approach, but I may be missing something obvious.
I created a new callback class in integrations.py

class AzureMLCallback(TrainerCallback):

    def __init__(self, azureml_run=None):
        assert (
            _has_azureml
        ), "AzureMLCallback requires azureml to be installed. Run `pip install azureml-sdk`."
        self.azureml_run = azureml_run

    def on_init_end(self, args, state, control, **kwargs):
        if self.azureml_run is None and state.is_world_process_zero:
            self.azureml_run = Run.get_context()

    def on_log(self, args, logs=None, **kwargs):
        if self.azureml_run:
            for k, v in logs.items():
                if isinstance(v, (int, float)):
                    self.azureml_run.log(k, v, description=k)

and did another bunch of other minor changes.

Upon installing on a machine my fork of the library with

pip install git+https://github.com/davidefiocco/transformers.git@c32718170899d1110a77ab116a2a60bbe326829e --quiet 

when running

python run_glue.py --model_name_or_path bert-base-cased \
                    --task_name CoLA \
                    --do_train \
                    --do_eval \
                    --train_file ./glue_data/CoLA/train.tsv \
                    --validation_file ./glue_data/CoLA/dev.tsv \
                    --max_seq_length 128 \
                    --per_device_train_batch_size 32 \
                    --learning_rate 2e-5 \
                    --num_train_epochs 3.0 \
                    --output_dir output \
                    --evaluation_strategy steps \
                    --logging_steps 8 \
                    --eval_steps 4

I get the error:

Traceback (most recent call last):
File “run_glue.py”, line 417, in
main()
File “run_glue.py”, line 352, in main
model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 792, in train
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 853, in _maybe_log_save_evaluate
metrics = self.evaluate()
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 1291, in evaluate
self.log(output.metrics)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 1044, in log
self.control = self.callback_handler.on_log(self.args, self.state, self.control, logs)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer_callback.py”, line 366, in on_log
return self.call_event(“on_log”, args, state, control, logs=logs)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer_callback.py”, line 382, in call_event
**kwargs,
TypeError: on_log() got multiple values for argument ‘logs’

So there’s likely something wrong in my AzureMLCallback… can someone help me spot the issue?

If you wish to replicate the behavior you can use this notebook https://colab.research.google.com/gist/davidefiocco/416c382cd51ad58cabf3eb940c040220/azureml-logging-on-transformers.ipynb while the source code is https://github.com/davidefiocco/transformers/tree/c32718170899d1110a77ab116a2a60bbe326829e

Hi there! Glad to see you try the new callbacks! The mistake is that you did not leave state and control which are positional arguments. Just replace you on_log definition by:

def on_log((self, args, state, control, logs=None, **kwargs):

and you’ll be fine!

1 Like

Indeed! Aw, I don’t know why I messed with the function signature when copying the available examples! :man_facepalming:

Thanks for spotting that and for the new tricks!

might be interesting to add this snippet to https://github.com/huggingface/transformers/blob/master/src/transformers/integrations.py @davidefiocco

Cool @julien-c ! I will review a couple of things and aim to send a PR to you by this week.