I am Tanmoy Sarkar
[Github link] [HuggingFace ID] [Twitter]
I am final year undergrad computer science and engineering student. My area of research is Deep learning, accelerated computing. Currently learning NLP, Quantum computing.
NLP in Bengali comes with different set of challenges. Bengali has large number of letters, vowels. Unlike English or other majority languages it can form conjugate letters combining two or more letters. In spite of being member of Indo-European linguistics Bengali has different set of characteristics, way of expressions, structure of sentences. And the most important thing may be if we can solve NLP in Bengali we can even solve other languages like Hindi, Sanskrit etc. as these are all similar and solving this would be a huge achievement as there are approximately 1 billion native speaker in Bengali and Hindi combined in India, Bangladesh, Pakistan etc.
Projects I am currently working on
In spite of being a rich language there are very few good open source dataset available on Bengali. So I have decided to create my own dataset. We have formed a small community where there are some developers working on gathering quality datasets from classic novels, articles etc. along with the dataset annotation. It might take some time to finish that task but we are pretty much excited thinking about the possibility of NLP in Bengali Language and this dataset. Also I am planning some other community projects which I will be talking about later.
A little more about me
I am from India and I speak in Bengali(Native), English and Hindi. I am still new to NLP and I am very much focused towards it. Other than that I like doing literally everything like from blockchain to deep learning, from accelerated computing to Quantum ML also I love music