Merging bert-base-uncased models after trainer but before predict

Hi everyone,

I created and trained two bert-base-uncased models, using the run_ner.py script from the huggingface transformers examples, to predict one the PoS-tags and one the DEPREL-tags (both are attributes of the CoNLL-U Format).

I trained the two models separately on the same dataset to which I made some changes: in the first model I made it predict the labels corresponding to the PoS-tags and in the second model I made it predict the labels corresponding to the DEPREL-tags.

Once the training is finished, I loaded the weights (AutoModelForTokenClassification.from_pretrained), the configurations (AutoConfig.from_pretrained) and the tokenizers (AutoTokenizer.from_pretrained) of the two models using the functions made available by the library.

By doing this I get two different trainers, like the following:

trainer_pos_tag = Trainer(
        model=model_pos_tag,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics,
    )

trainer_deprel_tag = Trainer(
        model=model_deprel_tag,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics,
    )

with their respective configurations and tokenizers.

What I would like to achieve is to use these two models to predict both the PoS-tag and the DEPREL-tag at the same time (so I think it is necessary to be able to have a single model that predicts both labels, right?).

Now my problem is that I necessarily need to merge these two models (merge their weights (does that make sense?) or something like that) before doing the prediction (trainer.predict(test_dataset) ) operation.

How can I do?
Do you have any suggestions?

P.s. The important thing is that the model resulting from the union/merge of these two models always has a Trainer type.

Thanks in advance!

The question is also on StackOverflow with the same title (I can’t put more than 2 links in the post because I’m a new user).

I think you would get better results averaging the predictions rather than merging the weights. You can do this by calling trainer_xxx.predict(...) and then average the results.

@sgugger WAIT! You are telling me that after having trained the two models separately, on the same dataset where I only change the labels to be predicted, I just need to average the values obtained from the metrics calculated by the compute_metrics function (i.e. accuracy_score, precision, recall, F1) during prediction on test dataset and is it as if I had “combined the two models and made the two predictions at the same time”?

If so, you’ve solved a huge problem for me.

Not the metrics value, the predictions. You can feed those prediction to your metrics and check if it’s better or worse.

I think I did not understand…

What is done in run_ner.py script is:

# Predict
    if training_args.do_predict:
        test_dataset = TokenClassificationDataset(
            token_classification_task=token_classification_task,
            data_dir=data_args.data_dir,
            tokenizer=tokenizer,
            labels=labels,
            model_type=config.model_type,
            max_seq_length=data_args.max_seq_length,
            overwrite_cache=data_args.overwrite_cache,
            mode=Split.test,
        )

        predictions, label_ids, metrics = trainer.predict(test_dataset)
        preds_list, _ = align_predictions(predictions, label_ids)

        output_test_results_file = os.path.join(training_args.output_dir, "test_results.txt")
        if trainer.is_world_master():
            with open(output_test_results_file, "w") as writer:
                for key, value in metrics.items():
                    logger.info("  %s = %s", key, value)
                    writer.write("%s = %s\n" % (key, value))

and I did it for both my models.

But the metrics are exactly these: accuracy_score, precision, recall, F1.

I don’t understand what you mean by “You can feed those prediction to your metrics and check if it’s better or worse”, in a code level.

At the line trainer.predict, you get the predictions from your model. Or maybe the pred_list obtained at the line after, I’m not familiar with what align_predictions do. Get those for you two trainers and average them, then feed that and the labels to the function used to compute the metrics.

1 Like

ok I think I understand! Thanks so much for your help and patience!