I am looking to run some NER models via the Inference API, but I am running into some issues. My problem is that the Inference API does not seem to return token positions. Consider this request:
curl -X POST https://api-inference.huggingface.co/models/dslim/bert-base-NER \
-H "Authorization: Bearer <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d "Hello Sarah Jessia Parker who lives in New York."
So it finds the right tokens (and, nicely, returns the identified tokens grouped correctly.) However, there is no indication of where the tokens start in the input text.
Thanks for the note. You are perfectly right in your assumptions.
In earlier versions of tokenizers/transformers the offsets of tokens were not necessarily available (because the encode operation was destructive). Because offsets were not available, the API did not send this information.
However thanks to tokenizers@anthony and in the near future 4.0 version of transformers, the encode operation won’t be destructive anymore, so offsetswill be available.