Unsure if correctly loading .from_pretrained models

Hello I am trying to do some NER with some pretrained Huggingface Models, and I am not sure if I am incorrectly loading them. I am trying to do some NER, and for my test example I am using the following text:

“The Youngs Modulus of E-Glass Fibre was found to be 72 GPa.”

I have tried running the code below with various pre-trained HuggingFace models and tokenizers, but all come back returning the entire text with entities labeled as simply Label_1 or Label_0. I would expect at least one of these to be able to identify either the material, material property or at least the numerica value, especially since the scibert, matscibert and MatBERT were trained on scientific corpus. Not sure how I should proceed.

The Huggingface Models used:

  • bert-base-uncased
  • dslim/bert-base-NER
  • allenai/scibert_scivocab_uncased
  • m3rg-iitd/matscibert
  • alan-yahya/MatBERT

My code is here:

import pandas as pd
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

pretrained_name = “alan-yahya/MatBERT” # change this with any of the above
tokenizer = AutoTokenizer.from_pretrained(pretrained_name)
model = AutoModelForTokenClassification.from_pretrained(pretrained_name)
nlp = pipeline(“ner”, model=model, tokenizer=tokenizer)
text = “The Youngs Modulus of E-Glass Fibre was found to be 72 GPa.”
rer = nlp(text)
print(pd.DataFrame(rer))

And my output is:

1 Like