Japanese NLP - Introductions

yusukemori · February 22, 2021, 1:43am

This is the introduction thread for Japanese NLP practitioners.

Welcome! Please introduce yourself and let us know:

Your name, Github, Hugging Face, and/or Twitter handle
Your interest in Japanese NLP
Some projects you are working on or interested in starting
Any other languages that you speak, any personal interests, anything else really

yusukemori · February 22, 2021, 1:54am

I’m Yusuke Mori.
My Github account: forest1988

My interest in Japanese NLP:
Although I mainly work with English texts as my research subject, my first language that I use daily is Japanese.
I was involved in the Japanese translation project of spacy-course. I would be pleased if I could make some contribution in the “Languages at Hugging Face”.

Projects I am working on or interested in starting:
I’m now working on the documentation of BERT Japanese.
https://github.com/huggingface/transformers/issues/9035

Any other languages:
I am not very good, but I can communicate in English to some extent.

polm · February 24, 2021, 11:12am

Hello, my name is Paul O’Leary McCann. I’m the creator of fugashi, the MeCab wrapper which is used for Japanese tokenization in some BERT models. I also maintain mecab-python3, which was used in Transformers before fugashi. I also have done a lot of work on the Japanese models in spaCy and worked on improving speed in SudachiPy last year.

My interest in Japanese NLP:

I live in Japan and have worked in Japanese industry for years, and as an independent consultant the past year and a half. I think that while there are good tools for working with the unique challenges Japanese presents there’s often issues with ease of use or maintenance, so a lot of my open source work has focused on that.

Projects I am working on or interested in starting:

I just released Kanji Club, a kanji search site, though it isn’t a machine learning project. Other than that, I’m not working on anything in Transformers actively at the moment but I enjoy keeping up with community developments and look forward to seeing what people come up with! Please do feel free to @ me if you have any trouble with or questions about the MeCab wrappers.

Actually, if the Livedoor News Corpus isn’t in datasets yet it should probably be added, so that’s a project idea…

Some of my projects:

Kanji Club: Just released instant kanji search-by-parts site
fugashi: Pythonic Cython-based MeCab wrapper
cutlet: A library to Romanize Japanese

Elsewhere online:

My homepage: https://dampfkraft.com
Github: polm (Paul O'Leary McCann) · GitHub
Twitter: https://twitter.com/polm23

Atsunori · March 22, 2021, 8:57am

Hello, I’m Atsunori Fujita.

My GitHub: AtsunoriFujita (atfujita) · GitHub

My interest in Japanese NLP:
I would like to add more Japanese language models so that Japanese developers can easily build Japanese language solutions.

Some projects you are working on or interested in starting:
I am planning to add a Japanese pre-training model(Considering XLNet, Roberta).

Building datasets and developing tokenizers are crucial concerns in my planning. I would be happy to cooperate with you

yusukemori · May 15, 2021, 8:22am

The documentation of BERT Japanese is included in the v4.6.0 release!

Add documentation for BertJapanese #11219 (@forest1988)

kazuma-t · July 13, 2021, 1:53am

Hello, I am Kazuma Takaoka.
I am a developer of Japanese morphological analyzers, Sudachi and SudachiPy.
We have recently started a project to use SudachiPy as a tokenizer for transformers, and we are planning to release models using Sudachi.

We welcome your comments on our project.

shunk031 · February 28, 2023, 4:15pm

Hi, NLPers! My name is Shunsuke Kitada. I’m a Ph.D. student interested in NLP for deep learning and multimodal fields with NLP and computer vision. @whitphx san told me this awesome thread on Twitter. Like you guys, one of my interests is Japanese NLP.

Lately, I have been favoring huggingface (HF) datasets. HF datasets can be published separately from the cumbersome data loader part, and has a simple and very easy-to-use interface. We hope it will be used by a wide variety of people.

I’m currently releasing implementations of several Japanese datasets/benchmarks to be made available on HF datasets. Here are some of them:

In addition to these datasets, I am planning to release HF datasets implementations for other Japanese language datasets as well. I hope that these activities contribute to boosting the Japanese NLP field.

yusukemori · February 28, 2023, 4:54pm

Welcome @shunk031 san!
I hope to enliven this thread and I really appreciate your joining here!

Let’s have an interesting discussion!

kaisugi · March 1, 2023, 3:26am

Hi, I’m Kaito Sugimoto from The University of Tokyo (GitHub: kaisugi (Kaito Sugimoto) · GitHub)

I’ve been maintaining an article titled フリーで使える日本語の主な大規模言語モデルまとめ, which collects various pre-trained language models specific to Japanese.

Although recent trends are in a very-large-scale model like ChatGPT, I believe there is still of some value in gathering information about “medium-size” models we can fine-tune on our own.
If you want to add a new model or make a correction, feel free to comment to the article (or contact me via hellorusk1998[at]gmail.com )

yusukemori · March 1, 2023, 4:17am

Welcome @kaisugi san!
I often look at your site and use it as a reference. Thank you for maintaining such a useful information source and I’m so happy to see you in this thread!

AkimfromParis · March 11, 2023, 5:47pm

Hello,

My name is Akim Mousterou. I am born and raised in Paris, France. 39 years old. I am currently living in the city of light but moving around a lot (SF, HK, and Tokyo). My work in NLP is revolving mostly around NER/Knowledge base for strategic insights. Apart from NLP, I am passionate about network effects, alternative datasets, and quantitative research.

My interest in Japanese NLP:
In university, I did Japanese studies and I have a Master’s degree in multilanguage engineering, NLP with a focus on the Japanese language. From 2009 to 2010, I did my working holiday visa in Tokyo, Japan. Over the years, I have worked as a business consultant with European companies in Japan and a few Japanese companies.
Recently, I validated the JLPT N2 for fun but my Japanese is a little bit rusty. : )

My research in NLP & Quantitative research:
I shared a few notions on my Github on the specificities of Japanese in NLP that obviously you might be aware of it.
NER specificities in Japanese for Masa of Softbank on Twitter, testing of ASR Whisper on earning calls of Uniqlo, and introduction in Quantum NLP for Japanese → AkimParis · GitHub

My latest project is an Anki deck with around 400 words in Japanese (English and French) about machine learning, statistic, and natural language processing to promote communication among NLP practitioners. → Vocabulary Japanese (En/Fr) about Machine Learning & NLP/CV - AnkiWeb

Please feel free to connect on LinkedIn (Akim Mousterou) or GitHub (AkimfromParis).

よろしくお願いします～!

Akim

yusukemori · May 12, 2023, 12:44am

Welcome @AkimfromParis san!
Sorry for the late comment.

Thank you for sharing your works!
Promoting communication among NLP practitioners (and team members with different expertise) is what I am interested in, and your deck seems helpful!

どうぞ宜しくお願いします！

Yusuke

yusukemori · May 12, 2023, 12:51am

It seems that I cannot update my self-introduction post, so let me add some updated information.

I’m Yusuke Mori.
In 2021, I got my Ph.D. in the field of Information Science and Technology.
Now I am working as a researcher in the field of NLP.

My interest in Japanese NLP:
I am interested in storytelling, machine learning, and natural language processing, which I believe have a tight relationship with creativity.

Please visit my website if you are interested in the following topics,

COMPASS (a writing support system to COMPlement Author unaware Story gapS)
Missing Position Prediction

sin2piusc · April 30, 2024, 12:55am

I’m a grad student in Cognitive and Brain Science at USC ( a university in California ). Right now I’m primarily working on whisper. I’m trying to create an accurate model for Japanese translations of things related to pop culture like tv anime etc.

Topic		Replies	Views
Yorùbá NLP - Introductions Languages at Hugging Face	1	530	February 23, 2021
Portuguese NLP - Introductions Languages at Hugging Face	0	362	March 11, 2021
Thai NLP - Introductions Languages at Hugging Face	3	1653	October 10, 2022
Hindi NLP Introduction 🔥 Languages at Hugging Face	39	4850	March 8, 2021
Indian Languages NLP Languages at Hugging Face	6	783	February 25, 2021

Japanese NLP - Introductions

Related topics