Thai NLP - Introductions

sakares · February 23, 2021, 4:22am

สวัสดีครับ!

Following the example of other language communities, this is the introduction thread for Thai NLP practitioners using HuggingFace libraries! To get started, please feel free to introduce yourself with any of the following:

Your name, Github, Hugging Face, and/or Twitter handle
Some projects you are working on or interested in starting
Any potential directions you have in mind for the Hugging Face Thai community
Anything else you’d like to share!

For me, I am Sakares ( @sakares on Github), I work for NLP in the broad spectrum from reading and re-implement some part of papers, making a pull-request back to Huggingface/other NLP open sources, try out Kaggle competitions in spare time for some NLP challenge.

This is my following PR back to HF
https://github.com/huggingface/transformers/pull/3277

Currently, I am working in the Lab team from Digital Office, SCG from Thailand. Most of my works focus on POCing and delivering NLP solutions to various existing gap in business. My current project focuses on the Speech-to-Text model/app development and also the Question-Answering domain.

Yeah, nice to e-meet you all

Jung · March 3, 2021, 6:12am

Hi, K. @sakares ! This is Jung from ThaiKeras and Kaggle krub
I have not been in particular working on Thai NLP but NLP in general with Tensorflow.
Will be very happy if I can help anything krub.

I have contributed two Tensorflow models on Huggingface which are

TFDPR
https://github.com/huggingface/transformers/pull/8203

and

TFRAG
https://github.com/huggingface/transformers/pull/9002

The two combined consume me 500+ hours already plus I just finished one tough Kaggle competition, so I am currently on a month break krub

sakares · March 3, 2021, 11:54am

Hi, K’ @Jung I have read your article explaining the Huggingface Transformers model landscape, and it was awesome! Also, I never give a try on the DPR model on TF yet, and it seems interesting! Thanks for coming there.

jonfernquest · October 10, 2022, 9:42am

Hi, I am Jon, a PhD student in computer engineering at MFU university in Chiang Rai
(also at the University of Hawaii). (Twitter: @FernquestJon; GitHub: jonfernq)

The specific task I am planning to attempt to employ Thai language Bert transformer models for is to improve the machine translation of the Ratchathirat Thai language epic.
(note: this epic exists in Burmese, Mon, Thai and Pali language versions).

One gets an automatic translation from Google Translate as one pages through the text at the national library version (https://vajirayana.org/) but it is very rough (mostly related to a deficient specialized vocabulary for that language domain).

My hope is that by using suitable pre-training with a Bert Thai language model, a much better machine translation can be derived.

This is probably on the wish list of many language scholars, as I know it can’t be done with Sanskrit yet, the Digital Corpus of Sanskrit (DCS) being about as good as leading scholars can get at the moment (Digital Corpus of Sanskrit (DCS) - Online Sanskrit dictionary and annotated corpus), but this barrier will be removed in the near future, I believe.

Topic		Replies	Views
Vietnamese NLP - Introductions Languages at Hugging Face	3	1200	June 23, 2021
Hindi NLP Introduction 🔥 Languages at Hugging Face	39	4854	March 8, 2021
Indian Languages NLP Languages at Hugging Face	6	785	February 25, 2021
Portuguese NLP - Introductions Languages at Hugging Face	0	364	March 11, 2021
Bengali NLP - Introductions Languages at Hugging Face	14	2341	February 26, 2021

Thai NLP - Introductions

Related topics