Hi!
I saw that @sgugger recently refactored the way in which transformers
integrates with tools to visualize logs in a more helpful way: https://github.com/huggingface/transformers/pull/7596
As I am running in Azure and using AzureML, I was trying to see if I could do something similar.
Prior to the PR above, I could add a pair of very simple snippets that allowed to send information to Azure via azureml.core.Run class - Azure Machine Learning Python | Microsoft Learn
I tried to replicate the above with the new approach, but I may be missing something obvious.
I created a new callback class in integrations.py
class AzureMLCallback(TrainerCallback):
def __init__(self, azureml_run=None):
assert (
_has_azureml
), "AzureMLCallback requires azureml to be installed. Run `pip install azureml-sdk`."
self.azureml_run = azureml_run
def on_init_end(self, args, state, control, **kwargs):
if self.azureml_run is None and state.is_world_process_zero:
self.azureml_run = Run.get_context()
def on_log(self, args, logs=None, **kwargs):
if self.azureml_run:
for k, v in logs.items():
if isinstance(v, (int, float)):
self.azureml_run.log(k, v, description=k)
and did another bunch of other minor changes.
Upon installing on a machine my fork of the library with
pip install git+https://github.com/davidefiocco/transformers.git@c32718170899d1110a77ab116a2a60bbe326829e --quiet
when running
python run_glue.py --model_name_or_path bert-base-cased \
--task_name CoLA \
--do_train \
--do_eval \
--train_file ./glue_data/CoLA/train.tsv \
--validation_file ./glue_data/CoLA/dev.tsv \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir output \
--evaluation_strategy steps \
--logging_steps 8 \
--eval_steps 4
I get the error:
Traceback (most recent call last):
File “run_glue.py”, line 417, in
main()
File “run_glue.py”, line 352, in main
model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 792, in train
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 853, in _maybe_log_save_evaluate
metrics = self.evaluate()
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 1291, in evaluate
self.log(output.metrics)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer.py”, line 1044, in log
self.control = self.callback_handler.on_log(self.args, self.state, self.control, logs)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer_callback.py”, line 366, in on_log
return self.call_event(“on_log”, args, state, control, logs=logs)
File “/usr/local/lib/python3.6/dist-packages/transformers/trainer_callback.py”, line 382, in call_event
**kwargs,
TypeError: on_log() got multiple values for argument ‘logs’
So there’s likely something wrong in my AzureMLCallback
… can someone help me spot the issue?
If you wish to replicate the behavior you can use this notebook Google Colab while the source code is https://github.com/davidefiocco/transformers/tree/c32718170899d1110a77ab116a2a60bbe326829e