Hello, I am trying to finetune bert on classification task but I am getting this error during the training.
e[ASaving model checkpoint to /gpfswork/rech/kpf/umg16uw/results_hf/checkpoint-500
Configuration saved in /gpfswork/rech/kpf/umg16uw/results_hf/checkpoint-500/config.json
Model weights saved in /gpfswork/rech/kpf/umg16uw/results_hf/checkpoint-500/pytorch_model.bin
Traceback (most recent call last):
File “/gpfs7kw/linkhome/rech/genlig01/umg16uw/test/expe_5/traitements/Flaubert_huggingface.py”, line 225, in
train_results = trainer.train()
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/site-packages/transformers/trainer.py”, line 1325, in train
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch)
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/site-packages/transformers/trainer.py”, line 1422, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/site-packages/transformers/trainer.py”, line 1537, in _save_checkpoint
self.state.save_to_json(os.path.join(output_dir, “trainer_state.json”))
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/site-packages/transformers/trainer_callback.py”, line 96, in save_to_json
json_string = json.dumps(dataclasses.asdict(self), indent=2, sort_keys=True) + “\n”
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/json/init.py”, line 234, in dumps
return cls(
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/json/encoder.py”, line 201, in encode
chunks = list(chunks)
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/json/encoder.py”, line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/json/encoder.py”, line 405, in _iterencode_dict
yield from chunks
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/json/encoder.py”, line 325, in _iterencode_list
yield from chunks
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/json/encoder.py”, line 405, in _iterencode_dict
yield from chunks
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/json/encoder.py”, line 438, in _iterencode
o = _default(o)
File “/linkhome/rech/genlig01/umg16uw/.conda/envs/bert/lib/python3.9/json/encoder.py”, line 179, in default
raise TypeError(f’Object of type {o.class.name} ’
TypeError: Object of type ndarray is not JSON serializable
76%|███████▋ | 500/654 [02:52<00:52, 2.91it/s]
srun: error: r10i6n1: task 0: Exited with exit code 1
srun: Terminating job step 1775050.0
output file :
file in training… /gpfs7kw/linkhome/rech/genlig01/umg16uw/test/expe_5/dataset/train_corpus/train_80tr_moins_20t/80tr/corpusIxAug_et_Or80tr.xlsx
Filename in processed… corpusIxAug_et_Or80tr
Number of sentences 18568.00…
Type of preprocessing… verbatim
Train : 13926 Val : 4642
{‘loss’: 1.0099, ‘learning_rate’: 4.9923547400611625e-05, ‘epoch’: 0.0}
{‘loss’: 0.872, ‘learning_rate’: 4.235474006116208e-05, ‘epoch’: 0.46}
{‘eval_loss’: 0.5922592878341675, ‘eval_accuracy’: 0.7615252046531668, ‘eval_f1’: array([0.64872657, 0.83726867, 0.71302958]), ‘eval_precision’: array([0.62674095, 0.80523732, 0.79571811]), ‘eval_recall’: array([0.67231076, 0.87195392, 0.64590876]), ‘eval_f1_mi’: 0.7615252046531666, ‘eval_precision_mi’: 0.7615252046531668, ‘eval_recall_mi’: 0.7615252046531668, ‘eval_f1_ma’: 0.733008272114256, ‘eval_precision_ma’: 0.7425654572607411, ‘eval_recall_ma’: 0.7300578132910654, ‘eval_runtime’: 9.6426, ‘eval_samples_per_second’: 481.407, ‘eval_steps_per_second’: 7.571, ‘epoch’: 0.46}
{‘loss’: 0.6014, ‘learning_rate’: 3.4709480122324164e-05, ‘epoch’: 0.92}
{‘eval_loss’: 0.30887845158576965, ‘eval_accuracy’: 0.8862559241706162, ‘eval_f1’: array([0.80689306, 0.92978868, 0.8742268 ]), ‘eval_precision’: array([0.82146543, 0.95429104, 0.83191629]), ‘eval_recall’: array([0.79282869, 0.90651307, 0.92107169]), ‘eval_f1_mi’: 0.8862559241706162, ‘eval_precision_mi’: 0.8862559241706162, ‘eval_recall_mi’: 0.8862559241706162, ‘eval_f1_ma’: 0.8703028482577086, ‘eval_precision_ma’: 0.8692242527354628, ‘eval_recall_ma’: 0.873471147629887, ‘eval_runtime’: 9.6181, ‘eval_samples_per_second’: 482.632, ‘eval_steps_per_second’: 7.59, ‘epoch’: 0.92}
{‘loss’: 0.3815, ‘learning_rate’: 2.7064220183486238e-05, ‘epoch’: 1.38}
{‘eval_loss’: 0.16964389383792877, ‘eval_accuracy’: 0.9467901766479966, ‘eval_f1’: array([0.9054878 , 0.96444059, 0.94674556]), ‘eval_precision’: array([0.92427386, 0.94437367, 0.96749811]), ‘eval_recall’: array([0.8874502 , 0.98537882, 0.92686459]), ‘eval_f1_mi’: 0.9467901766479966, ‘eval_precision_mi’: 0.9467901766479966, ‘eval_recall_mi’: 0.9467901766479966, ‘eval_f1_ma’: 0.9388913189246848, ‘eval_precision_ma’: 0.9453818807708362, ‘eval_recall_ma’: 0.933231203841253, ‘eval_runtime’: 9.402, ‘eval_samples_per_second’: 493.727, ‘eval_steps_per_second’: 7.764, ‘epoch’: 1.38}
{‘loss’: 0.2669, ‘learning_rate’: 1.9418960244648318e-05, ‘epoch’: 1.83}
{‘eval_loss’: 0.10839153826236725, ‘eval_accuracy’: 0.9648858250753986, ‘eval_f1’: array([0.93973442, 0.97762021, 0.96196232]), ‘eval_precision’: array([0.96436059, 0.97783688, 0.9448324 ]), ‘eval_recall’: array([0.91633466, 0.97740363, 0.97972484]), ‘eval_f1_mi’: 0.9648858250753986, ‘eval_precision_mi’: 0.9648858250753986, ‘eval_recall_mi’: 0.9648858250753986, ‘eval_f1_ma’: 0.9597723163259425, ‘eval_precision_ma’: 0.9623432895564524, ‘eval_recall_ma’: 0.9578210438568343, ‘eval_runtime’: 9.6063, ‘eval_samples_per_second’: 483.223, ‘eval_steps_per_second’: 7.599, ‘epoch’: 1.83}
{‘loss’: 0.1962, ‘learning_rate’: 1.1773700305810397e-05, ‘epoch’: 2.29}
{‘eval_loss’: 0.07769232243299484, ‘eval_accuracy’: 0.978026712623869, ‘eval_f1’: array([0.96184739, 0.98504179, 0.97815004]), ‘eval_precision’: array([0.96963563, 0.9781564 , 0.98388278]), ‘eval_recall’: array([0.95418327, 0.99202481, 0.97248371]), ‘eval_f1_mi’: 0.978026712623869, ‘eval_precision_mi’: 0.978026712623869, ‘eval_recall_mi’: 0.978026712623869, ‘eval_f1_ma’: 0.9750130736531469, ‘eval_precision_ma’: 0.9772249371959657, ‘eval_recall_ma’: 0.9728972620291924, ‘eval_runtime’: 9.458, ‘eval_samples_per_second’: 490.8, ‘eval_steps_per_second’: 7.718, ‘epoch’: 2.29}
and then it stops and causes the error
script :