How to map generated characters to tokens?

Is there an equivalent of BatchEncoding.char_to_token for the outputs of model.generate?

e.g. suppose I had the following output by calling tokenizer.decode(model.generate(...)["sequences"]):

"the cat sat on the mat"

where the output of tokenizer.convert_ids_to_tokens(model.generate(...)["sequences"]) may look like:

["_", "the", "_ca", "t", "_sat", "_on", "_", "the", "_mat"]

How would I get the token indices ([2, 3, 4]) for the part of the output corresponding to "cat sat" (e.g. if I wanted to inspect the attention)?

If there were an equivalent of the BatchEncoding.char_to_token function then I could get the character indices of "cat sat" from "".join(tokenizer.convert_ids_to_tokens(model.generate(...)["sequences"])) and then call char_to_token, but as far as I am aware something like this doesn’t exist? Is there an alternative method?