Trucated Inputs to our model

ajesujoba · April 11, 2021, 11:07pm

We have a model that automatically applies diacritics to text [Davlan/mT5_base_yoruba_adr · Hugging Face]. However, we noticed that any input longer than about 20 characters is truncated. For example:

Input: Mo je isu ati eyin ni Ibadan
Output: Mo jẹ́ iṣu àti ẹ̀yìn ní Ì
instead of
Mo jẹ́ iṣu àti ẹ̀yìn ní Ìbàdàn

How can we fix this?

Topic		Replies	Views
Token indices sequence length is longer than the specified maximum sequence length for this model 🤗Transformers	1	5423	July 21, 2023
Truncating sequence -- within a pipeline Beginners	7	5796	May 3, 2024
Question about truncate length of tokenizer Beginners	1	1248	September 20, 2022
Tokenizer behaviour with pipeline 🤗Tokenizers	0	923	August 1, 2023
T5 pretrained model truncate translation "large" text Beginners	3	1983	March 5, 2024

Trucated Inputs to our model

Related topics