Output of 'bert-base-NER-uncased' is different when using website and different when used via python

Hi,

I’m trying to use the above mentioned model for token classification.

Below is my sample text:

00:00:02 Speaker 1: hi john, it’s nice to see you again. how was your weekend? do anything special? 00:00:06 Speaker 2: yep, all good thanks. i was with my sister in derby. We saw, you know, that james bond film. what’s it called? then got a couple of drinks at the pitcher and piano, back in nottingham. 00:00:18 Speaker 1: that’s close to your flat, right? 00:00:25 Speaker 2: yeah, about five minutes away. i live on parliament street, remember? 00:00:39 Speaker 1: of course, i remember. you moved last year after you left your parents’ place. 00:00:39 Speaker 2: yeah, it was my sister’s birthday on sunday, susie, the older one. i told you last time about that new job she got. sainsbury’s, the one by victoria centre.

When using the hosted interface API, the output is excellent:

image_2021-11-10_184035

And here is the json output from the hosted API:

[
  {
    "entity_group": "PER",
    "score": 0.9778427481651306,
    "word": "john",
    "start": 23,
    "end": 27
  },
  {
    "entity_group": "LOC",
    "score": 0.9929279685020447,
    "word": "derby",
    "start": 166,
    "end": 171
  },
  {
    "entity_group": "MISC",
    "score": 0.7170370817184448,
    "word": "james bond",
    "start": 196,
    "end": 206
  },
  {
    "entity_group": "LOC",
    "score": 0.993842363357544,
    "word": "nottingham",
    "start": 293,
    "end": 303
  },
  {
    "entity_group": "LOC",
    "score": 0.9108084440231323,
    "word": "parliament street",
    "start": 420,
    "end": 437
  },
  {
    "entity_group": "PER",
    "score": 0.9840036034584045,
    "word": "susie",
    "start": 613,
    "end": 618
  },
  {
    "entity_group": "ORG",
    "score": 0.9001737236976624,
    "word": "sai",
    "start": 684,
    "end": 687
  },
  {
    "entity_group": "LOC",
    "score": 0.9343950748443604,
    "word": "##nsbury's",
    "start": 687,
    "end": 695
  },
  {
    "entity_group": "LOC",
    "score": 0.7310423851013184,
    "word": "victoria centre",
    "start": 708,
    "end": 723
  }
]

But when used via the python API using the following code:

from transformers import AutoTokenizer, AutoModelForTokenClassification

from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER-uncased")

model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER-uncased")

nlp = pipeline("token-classification", model=model, tokenizer=tokenizer)

example = """00:00:02 Speaker 1: hi john, it's nice to see you again. how was your weekend? do anything special? 00:00:06 Speaker 2: yep, all good thanks. i was with my sister in derby. We saw, you know, that james bond film. what's it called? then got a couple of drinks at the pitcher and piano, back in nottingham. 00:00:18 Speaker 1: that's close to your flat, right? 00:00:25 Speaker 2: yeah, about five minutes away. i live on parliament street, remember? 00:00:39 Speaker 1: of course, i remember. you moved last year after you left your parents' place. 00:00:39 Speaker 2: yeah, it was my sister's birthday on sunday, susie, the older one. i told you last time about that new job she got. sainsbury's, the one by victoria centre."""

ner_results = nlp(example)

print(ner_results)

print(len(ner_results))

I get very different results, here is the output of the code:

[{'entity': 'B-PER', 'score': 0.97784275, 'index': 10, 'word': 'john', 'start': 23, 'end': 27}, {'entity': 'B-LOC', 'score': 0.99292797, 'index': 50, 'word': 'derby', 'start': 166, 'end': 171}, {'entity': 'B-MISC', 'score': 0.8592305, 'index': 59, 'word': 'james', 'start': 196, 'end': 201}, {'entity': 'I-MISC', 'score': 0.5748464, 'index': 60, 'word': 'bond', 'start': 202, 'end': 206}, {'entity': 'B-LOC', 'score': 0.9938424, 'index': 83, 'word': 'nottingham', 'start': 293, 'end': 303}, {'entity': 'B-LOC', 'score': 0.8480199, 'index': 121, 'word': 'parliament', 'start': 420, 'end': 430}, {'entity': 'I-LOC', 'score': 0.973597, 'index': 122, 'word': 'street', 'start': 431, 'end': 437}, {'entity': 'B-PER', 'score': 0.9840036, 'index': 172, 'word': 'susie', 'start': 613, 'end': 618}, {'entity': 'B-ORG', 'score': 0.90017325, 'index': 190, 'word': 'sai', 'start': 684, 'end': 687}, {'entity': 'I-LOC', 'score': 0.93890965, 'index': 191, 'word': '##ns', 'start': 687, 'end': 689}, {'entity': 'I-LOC', 'score': 0.8916274, 'index': 192, 'word': '##bury', 'start': 689, 'end': 693}, {'entity': 'I-LOC', 'score': 0.9475074, 'index': 193, 'word': "'", 'start': 693, 'end': 694}, {'entity': 'I-LOC', 'score': 0.9595369, 'index': 194, 'word': 's', 'start': 694, 'end': 695}, {'entity': 'B-LOC', 'score': 0.55478203, 'index': 199, 'word': 'victoria', 'start': 708, 'end': 716}, {'entity': 'I-LOC', 'score': 0.90730333, 'index': 200, 'word': 'centre', 'start': 717, 'end': 723}]

As can be seen its detecting 15 entities which is way more than the hosted API. And it is even detecting 's as an I-LOC which is very wrong and makes the result unusable.

Why the difference in results? Am I doing something wrong in the code?

Thanks

You need to add agregation_strategy="simple" to your pipeline creation, to tell the pipeline to group together the tokens in the same entities.

1 Like