Multiple issues with JSON output of punctuation restoration model

I get weird JSON output when trying this punctuation-restoration model.

The author said that the model assigns “punctuation mark” to each word passed into the model.
However, there seems to be an issue with pre or post processing that causes punctuation to be inserted in the middle of words.

  1. Try the input I pasted below. A single word “mathematician” is broken into 3 words. A fullstop is inserted in the middle of the word. How’s that possible?
  2. I think start / end params in the output array are supposed to be positions of the word in the final output string. But it doesn’t count for spaces between the words. And so one word’s end = next one’s start. And because of the issue #1, we can’t relay on the fact that punctuation mark is always inserted at the end of the word.
  3. In the JSON output there’s a “” (empty string) for some label that were supposed to be “-”.


I’ve never known Warhol through my life who have been I don’t think I’m obsessed is an understatement of zest with puzzles of different types and wanted to get traditional one of them was one of the most curious and intelligent people I’ve ever met Houghton Houghton Conway and I would I’m looking at as Wikipedia unfortunately passed away some time ago but English mathematician and then the theory of finite

I’m new to this so please bare with me.
Who’s in fault here?
Is it the author of the model that writes a script to generate the output or is it managed by HuggingFace? I can’t find anywhere the script responsible for processing the input/output.

Model link: oliverguhr/fullstop-punctuation-multilang-large · Hugging Face