NER on SageMaker Ground Truth annotations

Anybody has a public sample showing how to run NER on annotations coming from SageMaker Ground Truth NER?

Hey @OlivierCR,

Sorry, I don’t have an example using NER annotations coming from SageMaker Ground Truth NER.

Using the example you have sent me

HF Datasets (conll2003)

{‘chunk_tags’: [11, 21, 11, 12, 21, 22, 11, 12, 0],
‘id’: ‘0’,
‘ner_tags’: [3, 0, 7, 0, 0, 0, 7, 0, 0],
‘pos_tags’: [22, 42, 16, 21, 35, 37, 16, 21, 7],
‘tokens’: [‘EU’,
‘rejects’,
‘German’,
‘call’,
‘to’,
‘boycott’,
‘British’,
‘lamb’,
‘.’]}

SM Ground Truth NER (doc)

{ “crowd-entity-annotation”: { “entities”: [ { “endOffset”: 26, “label”: “software”, “startOffset”: 0 }, { “endOffset”: 38, “label”: “version”, “startOffset”: 35 }, { “endOffset”: 88, “label”: “software”, “startOffset”: 84 }, { “endOffset”: 90, “label”: “version”, “startOffset”: 89 }, { “endOffset”: 93, “label”: “version”, “startOffset”: 92 }, { “endOffset”: 100, “label”: “version”, “startOffset”: 98 } ] } }

You could use the load_dataset to load your JSON files coming from SM Ground Truth and then use dataset.map() to iterate through it and adjust it to the datasets format

1 Like