Thai NLP - Introductions

สวัสดีครับ! :hugs: :thailand:

Following the example of other language communities, this is the introduction thread for Thai NLP practitioners using HuggingFace libraries! To get started, please feel free to introduce yourself with any of the following:

  • Your name, Github, Hugging Face, and/or Twitter handle
  • Some projects you are working on or interested in starting
  • Any potential directions you have in mind for the Hugging Face Thai community
  • Anything else you’d like to share!

For me, I am Sakares (:link: @sakares on Github), I work for NLP in the broad spectrum from reading and re-implement some part of papers, making a pull-request back to Huggingface/other NLP open sources, try out Kaggle competitions in spare time for some NLP challenge.

This is my following PR back to HF :hugs:
https://github.com/huggingface/transformers/pull/3277

Currently, I am working in the Lab team from Digital Office, SCG from Thailand. Most of my works focus on POCing and delivering NLP solutions to various existing gap in business. My current project focuses on the Speech-to-Text model/app development and also the Question-Answering domain.

Yeah, nice to e-meet you all :hugs:

2 Likes

Hi, K. @sakares ! This is Jung from ThaiKeras and Kaggle krub :smiley:
I have not been in particular working on Thai NLP but NLP in general with Tensorflow.
Will be very happy if I can help anything krub.

I have contributed two Tensorflow models on Huggingface which are

TFDPR
https://github.com/huggingface/transformers/pull/8203

and

TFRAG
https://github.com/huggingface/transformers/pull/9002

The two combined consume me 500+ hours already plus I just finished one tough Kaggle competition, so I am currently on a month break krub :grin:

2 Likes

Hi, K’ @Jung I have read your article explaining the Huggingface Transformers model landscape, and it was awesome! Also, I never give a try on the DPR model on TF yet, and it seems interesting! Thanks for coming there. :grinning:

1 Like

Hi, I am Jon, a PhD student in computer engineering at MFU university in Chiang Rai
(also at the University of Hawaii). (Twitter: @FernquestJon; GitHub: jonfernq)

The specific task I am planning to attempt to employ Thai language Bert transformer models for is to improve the machine translation of the Ratchathirat Thai language epic.
(note: this epic exists in Burmese, Mon, Thai and Pali language versions).

One gets an automatic translation from Google Translate as one pages through the text at the national library version (https://vajirayana.org/) but it is very rough (mostly related to a deficient specialized vocabulary for that language domain).

My hope is that by using suitable pre-training with a Bert Thai language model, a much better machine translation can be derived.

This is probably on the wish list of many language scholars, as I know it can’t be done with Sanskrit yet, the Digital Corpus of Sanskrit (DCS) being about as good as leading scholars can get at the moment (Digital Corpus of Sanskrit (DCS) - Online Sanskrit dictionary and annotated corpus), but this barrier will be removed in the near future, I believe.