Getting Wholeword corresponding to a subword in a text?

I am using AutoTokenizer to tokenize a text, and then for a certain subword mention index, I am trying to find which whole word it belongs to. I was using word_ids() to find the whole word index and white-space split the text to find the corresponding wholeword. However, if the text has puncutations, the whole word count after white-space splitting is getting mismatched. How can I handle it ?

whole_word_ids = encoded_passage.word_ids(0)
max_wholeword_id = np.nanmax(whole_word_ids[1:-1])            
whitespace_tokens = passage_text.split(' ')
assert max_wholeword_id+1 == len(whitespace_tokens), "Mismatched length"