We have a model that automatically applies diacritics to text [Davlan/mT5_base_yoruba_adr · Hugging Face]. However, we noticed that any input longer than about 20 characters is truncated. For example:
Input: Mo je isu ati eyin ni Ibadan
Output: Mo jẹ́ iṣu àti ẹ̀yìn ní Ì
instead of
Mo jẹ́ iṣu àti ẹ̀yìn ní Ìbàdàn
How can we fix this?