I have been able to fine-tune a t5 model for SQL to Natural Language task, however the Natural language was in english.
Now, what I have been trying to do is fine-tune a t5 model to convert a word in syllabes, for example:
joão → jo|ão|
camarão → ca|ma|rão|
The problem, is that since the t5-base model does not recognize characters with accents, it will output an ?? instead of ã.
I also have have tried out mt5 (based on the github from this article: https://towardsdatascience.com/how-to-train-an-mt5-model-for-translation-with-simple-transformers-30ba5fa66c5f), and despite mt5 being able to handle the non-english characters, it seems like, however I am not 100% sure, that the models uses only the vocabulary it has seen during training. Can anyone confirm this ? If it does, then it is useless in my case, cause the model I am trying to train should be able to decompose unseen words in syllabes.
If anyone can point out a solution for my problem, I would be very grateful. T5 would do the job if it recognized: â, ó , ã , ê , í etc…
Thanks a lot