How does a tokenzier (eg., AutoTokenizer) generate word_ids intergers?

Context of question: (scroll down to get to the Real question)

I need to find start_position and end_position of the sequence that I input to QA model (eg., RoBERTa, LLMv3) from the start-end character positions of the answer in a context. @NielsRogge in his notebook (Transformers-Tutorials/LayoutLMv2/DocVQA/Fine_tuning_LayoutLMv2ForQuestionAnswering_on_DocVQA.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub) does this by using word_ids information of the tokenized context and start-end token indices of the answer in the given context.
eg.,
context_token_indicies = [0, 1, 2, 3, 4, 5, 6, 7, || 8, 9, 10, 11, 12, 13 ||, 14, 15, 16, 17, 18] # pipes represent the scope of answer in tokenized context
ans_start_token_idx_in_context = 8
ans_start_token_idx_in_context = 13
context_word_ids = [0, 1, 2, 3, 3, 4, 5, 6, 7, || 8, 8, 9, 10, 10, 10, 11, 12, 13 ||, 14, 15, 16, 17, 18, 18] # pipes represent the scope of answer in tokenized context
(FYI: Repeated indices are created when a word like “bookworm” is broken down into “book” and “worm” by the tokenizer.)
we infer that…
start_position = 9
end_position = 17

Therefore, the success of the whole process is dependent on how we tokenize the context.

@NielsRogge used match, word_idx_start, word_idx_end = subfinder(words, answer.split()) (gets first match; doesn’t work for me) which is the same as doing context.split(), but this fails to split at “,”, “.” etc. as done by AutoTokenizer.
eg., “Great! Keep it up.” must be tokenized as [“Great”, “!”, “Keep”, “it”, “up”, “.”]
=> context_token_indicies = [0, 1, 2, 3, 4, 5] not [0, 1, 2, 3]
So I tried using nltk.word_tokenize. Though it does better, again this fails to split at “-”.
eg., “Great! Keep-it-up.” must be tokenized as [“Great”, “!”, “Keep”, “-”, “it”, “-”, “up”, “.”]
(Using offset_mapping instead of word_ids gives some other problems. RoBERTa, LLMv3 offsets look very different.)
I wonder how many other such edge cases exist. Thus the important point to be clarified is…

Real question:
“How do we need to tokenize context to find start_position and end_position for Question answering task?” This can be understood if we know “how huggingface tokenizers generate the integers in word_ids”.
Can anyone please answer one of the questions in the quotes?