This maybe the most beginner question of all .
I just started learning about NLP and hugging face. The first thing I’m trying to do is to apply one the bioBERT models on some clinical note data and see what I do, before moving on to the fine-tuning the model. And it looks like “emilyalsentzer/Bio_ClinicalBERT” to be the closest model for my data.
But as I try to use for any of the analyses I always get this warning.
Some weights of the model checkpoint at emilyalsentzer/Bio_ClinicalBERT were not used when initializing BertForSequenceClassification: [‘cls.predictions.transform.dense.bias’, ‘cls.seq_relationship.bias’, ‘cls.predictions.transform.dense.weight’, ‘cls.seq_relationship.weight’, ‘cls.predictions.bias’, ‘cls.predictions.transform.LayerNorm.weight’, ‘cls.predictions.transform.LayerNorm.bias’, ‘cls.predictions.decoder.weight’]
From the hugging face course I understand this means.
This is because BERT has not been pretrained on classifying pairs of sentences, so the head of the pretrained model has been discarded and a new head suitable for sequence classification has been added instead. The warnings indicate that some weights were not used (the ones corresponding to the dropped pretraining head) and that some others were randomly initialized (the ones for the new head). It concludes by encouraging you to train the model, which is exactly what we are going to do now.
So I went on to test which NLP task I can use “emilyalsentzer/Bio_ClinicalBERT” for out of the box.
from transformers import pipeline, AutoModel checkpoint = "emilyalsentzer/Bio_ClinicalBERT" nlp_task = ['conversational', 'feature-extraction', 'fill-mask', 'ner', 'question-answering', 'sentiment-analysis', 'text-classification', 'token-classification', 'zero-shot-classification' ] for task in nlp_task: print(task) process = pipeline(task=task, model = checkpoint)
It turned out that I shouldn’t/advised not use the model for any of the tasks (all got the warning message). And this really confuses me. The original bio_clinicalBERT model paper stated that they had good results on a few different tasks. So certainly the model was trained for those tasks. I also have similar issue with other models as well, i.e. the blog or research papers said a model obtained good results with a specific task but when I tried to apply with pipeline it gives the warning message. Is there any reason why the head layers were not included in the model?
I only have a few hundreds clinical notes (also unannotated ), so it doesn’t look like it’s big enough for training. Is there any way I could use the model on my data?