Japanese NLP - Introductions

This is the introduction thread for Japanese NLP practitioners.

Welcome! Please introduce yourself and let us know:

  • Your name, Github, Hugging Face, and/or Twitter handle
  • Your interest in Japanese NLP
  • Some projects you are working on or interested in starting
  • Any other languages that you speak, any personal interests, anything else really :wink:
2 Likes

I’m Yusuke Mori.
My Github account: forest1988

My interest in Japanese NLP:
Although I mainly work with English texts as my research subject, my first language that I use daily is Japanese.
I was involved in the Japanese translation project of spacy-course. I would be pleased if I could make some contribution in the “Languages at Hugging Face”.

Projects I am working on or interested in starting:
I’m now working on the documentation of BERT Japanese.
https://github.com/huggingface/transformers/issues/9035

Any other languages:
I am not very good, but I can communicate in English to some extent.

5 Likes

Hello, my name is Paul O’Leary McCann. I’m the creator of fugashi, the MeCab wrapper which is used for Japanese tokenization in some BERT models. I also maintain mecab-python3, which was used in Transformers before fugashi. I also have done a lot of work on the Japanese models in spaCy and worked on improving speed in SudachiPy last year.

My interest in Japanese NLP:

I live in Japan and have worked in Japanese industry for years, and as an independent consultant the past year and a half. I think that while there are good tools for working with the unique challenges Japanese presents there’s often issues with ease of use or maintenance, so a lot of my open source work has focused on that.

Projects I am working on or interested in starting:

I just released Kanji Club, a kanji search site, though it isn’t a machine learning project. Other than that, I’m not working on anything in Transformers actively at the moment but I enjoy keeping up with community developments and look forward to seeing what people come up with! Please do feel free to @ me if you have any trouble with or questions about the MeCab wrappers.

Actually, if the Livedoor News Corpus isn’t in datasets yet it should probably be added, so that’s a project idea…

Some of my projects:

  • Kanji Club: Just released instant kanji search-by-parts site
  • fugashi: Pythonic Cython-based MeCab wrapper
  • cutlet: A library to Romanize Japanese

Elsewhere online:

4 Likes

Hello, I’m Atsunori Fujita.

My interest in Japanese NLP:
I would like to add more Japanese language models so that Japanese developers can easily build Japanese language solutions.

Some projects you are working on or interested in starting:
I am planning to add a Japanese pre-training model(Considering XLNet, Roberta).

Building datasets and developing tokenizers are crucial concerns in my planning. I would be happy to cooperate with you

2 Likes

The documentation of BERT Japanese is included in the v4.6.0 release!

1 Like

Hello, I am Kazuma Takaoka.
I am a developer of Japanese morphological analyzers, Sudachi and SudachiPy.
We have recently started a project to use SudachiPy as a tokenizer for transformers, and we are planning to release models using Sudachi.

We welcome your comments on our project.

1 Like

Hi, NLPers! My name is Shunsuke Kitada. I’m a Ph.D. student interested in NLP for deep learning and multimodal fields with NLP and computer vision. @whitphx san told me this awesome thread on Twitter. Like you guys, one of my interests is Japanese NLP.

Lately, I have been favoring huggingface (HF) datasets. HF datasets can be published separately from the cumbersome data loader part, and has a simple and very easy-to-use interface. We hope it will be used by a wide variety of people.

I’m currently releasing implementations of several Japanese datasets/benchmarks to be made available on HF datasets. Here are some of them:

In addition to these datasets, I am planning to release HF datasets implementations for other Japanese language datasets as well. I hope that these activities contribute to boosting the Japanese NLP field.

2 Likes

Welcome @shunk031 san!
I hope to enliven this thread and I really appreciate your joining here!

Let’s have an interesting discussion!

1 Like

Hi, I’m Kaito Sugimoto from The University of Tokyo (GitHub: kaisugi (Kaito Sugimoto) · GitHub)

I’ve been maintaining an article titled フリーで使える日本語の主な大規模言語モデルまとめ, which collects various pre-trained language models specific to Japanese.

Although recent trends are in a very-large-scale model like ChatGPT, I believe there is still of some value in gathering information about “medium-size” models we can fine-tune on our own.
If you want to add a new model or make a correction, feel free to comment to the article (or contact me via hellorusk1998[at]gmail.com ) :blush:

2 Likes

Welcome @kaisugi san!
I often look at your site and use it as a reference. Thank you for maintaining such a useful information source and I’m so happy to see you in this thread!

2 Likes

Hello,

My name is Akim Mousterou. I am born and raised in Paris, France. 39 years old. I am currently living in the city of light but moving around a lot (SF, HK, and Tokyo). My work in NLP is revolving mostly around NER/Knowledge base for strategic insights. Apart from NLP, I am passionate about network effects, alternative datasets, and quantitative research.

My interest in Japanese NLP:
In university, I did Japanese studies and I have a Master’s degree in multilanguage engineering, NLP with a focus on the Japanese language. From 2009 to 2010, I did my working holiday visa in Tokyo, Japan. Over the years, I have worked as a business consultant with European companies in Japan and a few Japanese companies.
Recently, I validated the JLPT N2 for fun but my Japanese is a little bit rusty. : )

My research in NLP & Quantitative research:
I shared a few notions on my Github on the specificities of Japanese in NLP that obviously you might be aware of it.
NER specificities in Japanese for Masa of Softbank on Twitter, testing of ASR Whisper on earning calls of Uniqlo, and introduction in Quantum NLP for Japanese → AkimParis · GitHub

My latest project is an Anki deck with around 400 words in Japanese (English and French) about machine learning, statistic, and natural language processing to promote communication among NLP practitioners. → Vocabulary Japanese (En/Fr) about Machine Learning & NLP/CV - AnkiWeb

Please feel free to connect on LinkedIn (Akim Mousterou) or GitHub (AkimfromParis).

よろしくお願いします~!

Akim

2 Likes

Welcome @AkimfromParis san!
Sorry for the late comment.

Thank you for sharing your works!
Promoting communication among NLP practitioners (and team members with different expertise) is what I am interested in, and your deck seems helpful!

どうぞ宜しくお願いします!

Yusuke

1 Like

It seems that I cannot update my self-introduction post, so let me add some updated information.


I’m Yusuke Mori.
In 2021, I got my Ph.D. in the field of Information Science and Technology.
Now I am working as a researcher in the field of NLP.

My interest in Japanese NLP:
I am interested in storytelling, machine learning, and natural language processing, which I believe have a tight relationship with creativity.

Please visit my website if you are interested in the following topics,

  • COMPASS (a writing support system to COMPlement Author unaware Story gapS)
  • Missing Position Prediction