Enhaced word_ids() API for Chinese or CJK languages?

Is there an API like tokenizer.word_ids() to map/align sub-word to whole-word in CJK languages ? The word_ids() is useful for white-space tokenizable languages like Farsi and Russian. But I have difficulty in mapping Chinese to get the whole-word vocabulary embeddings.