Following the example of other language communities, this is the introduction thread for Thai NLP practitioners using HuggingFace libraries! To get started, please feel free to introduce yourself with any of the following:
Your name, Github, Hugging Face, and/or Twitter handle
Some projects you are working on or interested in starting
Any potential directions you have in mind for the Hugging Face Thai community
Anything else you’d like to share!
For me, I am Sakares ( @sakares on Github), I work for NLP in the broad spectrum from reading and re-implement some part of papers, making a pull-request back to Huggingface/other NLP open sources, try out Kaggle competitions in spare time for some NLP challenge.
Currently, I am working in the Lab team from Digital Office, SCG from Thailand. Most of my works focus on POCing and delivering NLP solutions to various existing gap in business. My current project focuses on the Speech-to-Text model/app development and also the Question-Answering domain.
Hi, K. @sakares ! This is Jung from ThaiKeras and Kaggle krub
I have not been in particular working on Thai NLP but NLP in general with Tensorflow.
Will be very happy if I can help anything krub.
I have contributed two Tensorflow models on Huggingface which are
Hi, K’ @Jung I have read your article explaining the Huggingface Transformers model landscape, and it was awesome! Also, I never give a try on the DPR model on TF yet, and it seems interesting! Thanks for coming there.
Hi, I am Jon, a PhD student in computer engineering at MFU university in Chiang Rai
(also at the University of Hawaii). (Twitter: @FernquestJon; GitHub: jonfernq)
The specific task I am planning to attempt to employ Thai language Bert transformer models for is to improve the machine translation of the Ratchathirat Thai language epic.
(note: this epic exists in Burmese, Mon, Thai and Pali language versions).
One gets an automatic translation from Google Translate as one pages through the text at the national library version (https://vajirayana.org/) but it is very rough (mostly related to a deficient specialized vocabulary for that language domain).
My hope is that by using suitable pre-training with a Bert Thai language model, a much better machine translation can be derived.
This is probably on the wish list of many language scholars, as I know it can’t be done with Sanskrit yet, the Digital Corpus of Sanskrit (DCS) being about as good as leading scholars can get at the moment (Digital Corpus of Sanskrit (DCS) - Online Sanskrit dictionary and annotated corpus), but this barrier will be removed in the near future, I believe.