Hi,
I am trying to finetune a BERT model on a custom dataset with 3 classes. I’ve followed theFine-tune a pretrained model tutorial and slightly adapted it to my needs.
However if I run it, I’m getting the following error after the first epoch:
{'eval_loss': 0.847770631313324, 'eval_f1': array([0. , 0.88, 0. ]), 'eval_runtime': 0.02, 'eval_samples_per_second': 699.017, 'eval_steps_per_second': 99.86, 'epoch': 1.0}
TypeError: Object of type ndarray is not JSON serializable
Full Stacktrace
{'eval_loss': 0.847770631313324, 'eval_f1': array([0. , 0.88, 0. ]), 'eval_runtime': 0.02, 'eval_samples_per_second': 699.017, 'eval_steps_per_second': 99.86, 'epoch': 1.0}
Model weights saved in ../../base_dir/finetuned_gnd_local/checkpoint-31/pytorch_model.bin
tokenizer config file saved in ../../base_dir/finetuned_gnd_local/checkpoint-31/tokenizer_config.json
Special tokens file saved in ../../base_dir/finetuned_gnd_local/checkpoint-31/special_tokens_map.json
Traceback (most recent call last):
File ".../finetune_bert.py", line 222, in <module>
finetune_bert(data_path=args.data_path,
File ".../finetune_bert.py", line 216, in finetune_bert
trainer.train()
File ".../venv/lib/python3.10/site-packages/transformers/trainer.py", line 1498, in train
return inner_training_loop(
File ".../venv/lib/python3.10/site-packages/transformers/trainer.py", line 1832, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File ".../venv/lib/python3.10/site-packages/transformers/trainer.py", line 2042, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File ".../venv/lib/python3.10/site-packages/transformers/trainer.py", line 2173, in _save_checkpoint
self.state.save_to_json(os.path.join(output_dir, TRAINER_STATE_NAME))
File ".../venv/lib/python3.10/site-packages/transformers/trainer_callback.py", line 97, in save_to_json
json_string = json.dumps(dataclasses.asdict(self), indent=2, sort_keys=True) + "\n"
File "/usr/lib/python3.10/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/usr/lib/python3.10/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.10/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/usr/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ndarray is not JSON serializable
The problem only occurs when I use the F1 score as a metric with average=None
So I believe this comes down to a bit of a misunderstanding on my part. I’m assuming the compute_metrics()
function is used for the mods to know how it’s performing, therefore it doesn’t like getting an array of per-class F1 scores?
If that is the the case, could someone explain to me, or point me towards a guide on how else I should properly log the training results that interest me. In my case I would like to print the per-class F1 scores so I can understand which classes my model is struggling with though maybe I’ll also be interested in some other metrics in the future.
Could anyone shine a light on this please?