Japanese NLP - Introductions

This is the introduction thread for Japanese NLP practitioners.

Welcome! Please introduce yourself and let us know:

  • Your name, Github, Hugging Face, and/or Twitter handle
  • Your interest in Japanese NLP
  • Some projects you are working on or interested in starting
  • Any other languages that you speak, any personal interests, anything else really :wink:
1 Like

I’m Yusuke Mori.
My Github account: forest1988

My interest in Japanese NLP:
Although I mainly work with English texts as my research subject, my first language that I use daily is Japanese.
I was involved in the Japanese translation project of spacy-course. I would be pleased if I could make some contribution in the “Languages at Hugging Face”.

Projects I am working on or interested in starting:
I’m now working on the documentation of BERT Japanese.

Any other languages:
I am not very good, but I can communicate in English to some extent.

5 Likes

Hello, my name is Paul O’Leary McCann. I’m the creator of fugashi, the MeCab wrapper which is used for Japanese tokenization in some BERT models. I also maintain mecab-python3, which was used in Transformers before fugashi. I also have done a lot of work on the Japanese models in spaCy and worked on improving speed in SudachiPy last year.

My interest in Japanese NLP:

I live in Japan and have worked in Japanese industry for years, and as an independent consultant the past year and a half. I think that while there are good tools for working with the unique challenges Japanese presents there’s often issues with ease of use or maintenance, so a lot of my open source work has focused on that.

Projects I am working on or interested in starting:

I just released Kanji Club, a kanji search site, though it isn’t a machine learning project. Other than that, I’m not working on anything in Transformers actively at the moment but I enjoy keeping up with community developments and look forward to seeing what people come up with! Please do feel free to @ me if you have any trouble with or questions about the MeCab wrappers.

Actually, if the Livedoor News Corpus isn’t in datasets yet it should probably be added, so that’s a project idea…

Some of my projects:

  • Kanji Club: Just released instant kanji search-by-parts site
  • fugashi: Pythonic Cython-based MeCab wrapper
  • cutlet: A library to Romanize Japanese

Elsewhere online:

3 Likes