BERT WordPiece Tokenizer: some matras missing after tokenization for Hindi Language #572

We trained our Bert WordPiece Tokenizer using the following dataset : https://drive.google.com/file/d/12MbWKERa7QPfI9F-xnMgIgAhjbN8EaQK/view?usp=sharing

The words ending with certain matras (eg. ए - े ) are missing these matras in the tokens.
For Eg : for the sentence "अपने पोस्ट ऑफिस में 420 पदों पर भर्ती "
the tokens were as follows : [‘अपन’, ‘पोस्ट’ , ‘ऑफिस’ , ‘म’ , ‘420’ , ‘पदो’ , ‘पर’ , ‘भर्ती’ ]
The first word (अपने) and third word (में ) are missing the ए ki matra after tokenization.

Even for the pretrained huggingface tokenizers, all the uncased tokenizers have the exact same issue. Words ending with ए ki matra are missing the matra after tokenization. However, cased pretrained tokenizers are working fine. (“bert-base-multilingual-cased” is working perfectly fine, however, “bert-base-multilingual-uncased” has the same issue mentioned above.)
Tokenization result for “bert-base-multilingual-cased” : [[‘अपने’, ‘प’, ‘##ो’, ‘##स्ट’, ‘ऑफ’, ‘##िस’, ‘में’, ‘420’, ‘##0’, ‘पद’, ‘##ों’, ‘पर’, ‘भर’, ‘##्ती’]
Tokenization result for “bert-base-multilingual-uncased” : [‘अपन’, ‘प’, ‘##ो’, ‘##सट’, ‘ऑफ’, ‘##िस’, ‘म’, ‘420’, ‘##0’, ‘पद’, ‘##ो’, ‘पर’, ‘भर’, ‘##ती’]

Why are these matras getting omitted after tokenization for our own tokenizer and the uncased bert tokenizers?