Customization of Wav2Vec2CTCTokenizer with rules

Hi, my goal is to fine-tune an ASR model, WavLM, that relies on the pretrained tokenizer Wav2Vec2CTCTokenizer.

I want to fine-tune this ASR model with another language and to perform the tokenization according to phonological rules, such as syllable segmentation.

Providing a vocabulary with all the possible syllables (aka my tokens), is it possible to customize the Wav2Vec2CTCTokenizer segmentation so that it will respect syllable segmentation rules?


Original sentence:
Il tentativo era cosi bello

Segmentation made by Wav2Vec2CTCTokenizer (not respecting syllabification rules):
[‘il’, ‘ten’, ‘tat’, ‘iv’, ‘o’, ‘Er’, ‘a’, ‘kos’, ‘i’, ‘bEl’, ‘lo’]

Expected segmentation according to syllabification rules:
[‘il’, ‘ten’, ‘ta’, ‘ti’, ‘vo’, ‘E’, ‘ra’, 'ko, ‘si’, ‘bEl’, ‘lo’]

Basically, I need to state and include some rules in the tokenizer, for example to give priority to tokens with a consonant in the onset position instead of in the coda of the syllable.

Is it possible to insert this kind of rules in the tokenizer?
If so, where can I modify these parameters?

If not, if I train a new tokenizer, will it be ok to implement it in the pre-trained WavLm model that I need to fine-tune?

Thanks in advance!