BioBERT NER issue

Hello,

I’m trying to implement :hugs: NER with BioBERT.

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("dmis-lab/biobert-v1.1")
model = AutoModelForTokenClassification.from_pretrained("dmis-lab/biobert-v1.1")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
sentence = "This expression of NT-3 in supporting cells in embryos and neonates may even preserve in Brn3c null mutants the numerous spiral sensory neurons in the apex of 8-day old animals."

result = nlp(sentence)
print(result)

But the result isn’t what I’m expecting.

Some weights of BertForTokenClassification were not initialized from the model checkpoint at dmis-lab/biobert-v1.1 and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[{'word': 'This', 'score': 0.5616263747215271, 'entity': 'LABEL_1', 'index': 1, 'start': 0, 'end': 4}, {'word': 'expression', 'score': 0.6285454630851746, 'entity': 'LABEL_1', 'index': 2,

The output is pretty clear : I need to train the model.
But, I’m not sure if with a trained model, I will manage to get rid off the ‘entity’: ‘LABEL_1’ issue.

My desired output would be something like:
https://bern.korea.ac.kr/

With a complete response such as:

{
    "project": "BERN",
    "sourcedb": "",
    "sourceid": "43c1bfdebd3ccb8c9a42d10a22a3be3e8b2fe9ae7601b244b6318d71-Thread-18603546",
    "text": "This expression of NT-3 in supporting cells in embryos and neonates may even preserve in Brn3c null mutants the numerous spiral sensory neurons in the apex of 8-day old animals.",
    "denotations": [
        {
            "id": [
                "HGNC:8020",
                "BERN:324182202"
            ],
            "span": {
                "begin": 19,
                "end": 23
            },
            "obj": "gene"
        },
        {
            "id": [
                "MIM:602460",
                "HGNC:9220",
                "Ensembl:ENSG00000091010",
                "BERN:324351702"
            ],
            "span": {
                "begin": 89,
                "end": 94
            },
            "obj": "gene"
        }
    ],
    "timestamp": "Thu May 27 08:22:14 +0000 2021",
    "logits": {
        "disease": [],
        "gene": [
            [
                {
                    "start": 19,
                    "end": 23,
                    "id": "HGNC:8020\tBERN:324182202"
                },
                0.9999972581863403
            ],
            [
                {
                    "start": 89,
                    "end": 94,
                    "id": "MIM:602460\tHGNC:9220\tEnsembl:ENSG00000091010\tBERN:324351702"
                },
                0.9999972581863403
            ]
        ],
        "drug": [],
        "species": []
    }
}

Am I in the right path to achieve that?
Any help/suggestion is more than welcome!

Cheers,
Vivian

@Vivian Did you get any success with this issue? I am also in a similar situation.

Nope.

In order to get something reliable, we deployed this:

Let’s see in the future how we could do differently :slight_smile:

1 Like

Hi @Vivian in your original issue, it prompts to fine tune the model on a downstream task. If I’m not wrong, we would need labelled data for that. I’m specifically interested in tagging diseases in Pubmed files, not sure how would I be able to fine tune BioBert for this task. Do you have any idea?

Also, regarding BERN (& BERN2), is there a hugging face implementation available? I checked the link you attached & apparently ~70 gb disk space shall be required to be able to use BERN for NER. I’m willing to do these things in google colab. Any idea how should I go about things? Or do you have any experience with NER on biomed text data?

Any help is highly appreciated :slight_smile:

Regards,
Srishti

Hi,

I’m still using BERN with some good results.
I did not find a BERN model with HuggingFace.
Perhaps in the future?

I will take a look at BERN2.
Good luck!

Okay, how are you using it? I mean is it the web api or did you get the code for it running on your system or cloud.

I want to use it for multiple files so web api isn’t the way for me, could you please share how can I get it running on my system?

We downloaded the BERN project in order to run it privately.
Definitely, the web API is not tailored for this kind of purpose.

Unfortunately I can’t share anything else than the useful BERN readme.

Perhaps you should ask to a WebDev/DevOps to deploy this solution.

1 Like