Japanese NLP - Introductions

This is the introduction thread for Japanese NLP practitioners.

Welcome! Please introduce yourself and let us know:

  • Your name, Github, Hugging Face, and/or Twitter handle
  • Your interest in Japanese NLP
  • Some projects you are working on or interested in starting
  • Any other languages that you speak, any personal interests, anything else really :wink:

I’m Yusuke Mori.
My Github account: forest1988

My interest in Japanese NLP:
Although I mainly work with English texts as my research subject, my first language that I use daily is Japanese.
I was involved in the Japanese translation project of spacy-course. I would be pleased if I could make some contribution in the “Languages at Hugging Face”.

Projects I am working on or interested in starting:
I’m now working on the documentation of BERT Japanese.

Any other languages:
I am not very good, but I can communicate in English to some extent.


Hello, my name is Paul O’Leary McCann. I’m the creator of fugashi, the MeCab wrapper which is used for Japanese tokenization in some BERT models. I also maintain mecab-python3, which was used in Transformers before fugashi. I also have done a lot of work on the Japanese models in spaCy and worked on improving speed in SudachiPy last year.

My interest in Japanese NLP:

I live in Japan and have worked in Japanese industry for years, and as an independent consultant the past year and a half. I think that while there are good tools for working with the unique challenges Japanese presents there’s often issues with ease of use or maintenance, so a lot of my open source work has focused on that.

Projects I am working on or interested in starting:

I just released Kanji Club, a kanji search site, though it isn’t a machine learning project. Other than that, I’m not working on anything in Transformers actively at the moment but I enjoy keeping up with community developments and look forward to seeing what people come up with! Please do feel free to @ me if you have any trouble with or questions about the MeCab wrappers.

Actually, if the Livedoor News Corpus isn’t in datasets yet it should probably be added, so that’s a project idea…

Some of my projects:

  • Kanji Club: Just released instant kanji search-by-parts site
  • fugashi: Pythonic Cython-based MeCab wrapper
  • cutlet: A library to Romanize Japanese

Hello, I’m Atsunori Fujita.

My interest in Japanese NLP:
I would like to add more Japanese language models so that Japanese developers can easily build Japanese language solutions.

Some projects you are working on or interested in starting:
I am planning to add a Japanese pre-training model(Considering XLNet, Roberta).

Building datasets and developing tokenizers are crucial concerns in my planning. I would be happy to cooperate with you


The documentation of BERT Japanese is included in the v4.6.0 release!

Hello, I am Kazuma Takaoka.
I am a developer of Japanese morphological analyzers, Sudachi and SudachiPy.
We have recently started a project to use SudachiPy as a tokenizer for transformers, and we are planning to release models using Sudachi.

We welcome your comments on our project.

