Bengali NLP - Introductions

This is the introduction thread for Bengali NLP practitioners.

Welcome! Please introduce yourself and let us know:

  • Your name, Github, Hugging Face, and/or Twitter handle
  • Your interest in Bengali NLP
  • Some projects you are working on or interested in starting
  • Any other languages that you speak, any personal interests, anything else really :wink:

I am Tanmoy Sarkar
[Github link] [HuggingFace ID] [Twitter]
I am final year undergrad computer science and engineering student. My area of research is Deep learning, accelerated computing. Currently learning NLP, Quantum computing.

Why BengaliNLP
NLP in Bengali comes with different set of challenges. Bengali has large number of letters, vowels. Unlike English or other majority languages it can form conjugate letters combining two or more letters. In spite of being member of Indo-European linguistics Bengali has different set of characteristics, way of expressions, structure of sentences. And the most important thing may be if we can solve NLP in Bengali we can even solve other languages like Hindi, Sanskrit etc. as these are all similar and solving this would be a huge achievement as there are approximately 1 billion native speaker in Bengali and Hindi combined in India, Bangladesh, Pakistan etc.

Projects I am currently working on
In spite of being a rich language there are very few good open source dataset available on Bengali. So I have decided to create my own dataset. We have formed a small community where there are some developers working on gathering quality datasets from classic novels, articles etc. along with the dataset annotation. It might take some time to finish that task but we are pretty much excited thinking about the possibility of NLP in Bengali Language and this dataset. Also I am planning some other community projects which I will be talking about later.

A little more about me
I am from India and I speak in Bengali(Native), English and Hindi. I am still new to NLP and I am very much focused towards it. Other than that I like doing literally everything like from blockchain to deep learning, from accelerated computing to Quantum ML :grinning: also I love music :hatching_chick:


Hello Everyone!

Iโ€™m Khalid Saifullah, a cs undergrad student from Dhaka, Bangladesh. I take great interest in Deep Learning research, visualizations, and a bit of photography.
I have only started learning about NLP and reading its papers, till now Iโ€™ve mostly read papers and implemented models of computer vision. It seems like the area of NLP is pretty vast and quite a complex problem to solve, still a lot more needs to be done, especially in languages other than English.

Really happy to join the community, hope we can all contribute to our mother tongue together.

Social Links:

Personal Website


Hey Tanmoy!
Any discord/slack group regarding the community collecting Bengali data?

Would love to get involved.

1 Like

@khalidsaifullaah @partham Thank you guys for your interest. Here is a discord group link for our little community -, we have already started little bit of work although we are still figuring out the annotation process. It would be awesome if you guys join us and we will then figure something out. :hugs: . Ping me there after joining, then I will add you guys to Bengali-Dataset private channel.

1 Like

Hi, I am from Bangladesh. I am also interested in collecting a large dataset for Bangla.


Hey I am Zaid, I mostly work with Arabic and I am also interested in MRLs (morphologically rich languages). I created this thread to discuss such interesting and difficult languages. I am interested in learning how such languages are similar and distant in terms of linguistics.


I am Sagor Sarker
Github Profile: sagorbrur (Sagor Sarker) ยท GitHub
Huggingface Profile: sagorsarker (Sagor Sarker)
Twitter Profile:
I have completed my Bsc Engineering in CSE and currently working as an Artificial Intelligence Engineer.
My area of expertise is NLP, deep learning, machine learning.
At present I am not working in any ongoing project, but have interest in building different transformer based nlp model in Bengali.
I am always open to collaborate.
Thanks and regards to all Bengali NLP practitioners.


Hey I checked your profile, your work is really impressive. We have a little community and we have already started working towards creating a Huge Bengali dataset. You might want to join us:


Yeah, that is a great idea, let me know your upcoming plans then we can discuss over it.


Hi, my name is Antaripa Saha, third year undergraduate student at NIT Agartala. I have been working with NLP projects for last 3-4 months. I am working as a ML engineer at Omdena.

I am interested in working with multilingual projects and also new to hugging face. I would love to connect with other working on bengali NLP project.
I speak in Bengali, Hindi, and English, and a little bit of German and Marathi.

Hugging face: itz-antaripa
GitHub: Itz-Antaripa
Twitter: @doesdatmaksense


Myself, Nilavya pre-final year student from India. I have a great interest in DL and NLP.
and I would love to work with Bengali NLP.



you can join the above mentioned discord community, we have already started working on the main dataset and little bit on tokenizers,

1 Like

Hi I am Soham Datta currently working in TCS Research in the area of Data and Decision sciences. I am working on NLP applications but being a Bengali, I want to transfer my learning (pun intended) to bengali.

1 Like

thank you for your interest, join the community via above mentioned link, lets discuss there :hugs: