Dutch NLP - Introductions

This is the introduction thread for Dutch NLP practitioners.

Welcome! Please introduce yourself and let us know:

  • Your name, Github, Hugging Face, and/or Twitter handle
  • Your interest in Dutch NLP
  • Some projects you are working on or interested in starting
  • Any other languages that you speak, any personal interests, anything else really :wink:
7 Likes

To kick things of, let me introduce myself a bit :wink:

I’m Thomas Dehaene, and I work as an ML Engineer at a company called ML6.

In my day-to-day job, I work a lot with Dutch NLP related topics of all sorts (summarization, entity extraction, text-generation, general text data analysis, etc.). In general I’m super interested in all things multilingual NLP!

Some recent work I helped on (together with the team) involved:

  • A small NLP analysis on some political documents (link)
  • Finetuning a Dutch GPT2 model (link)

Also up for a chat :nerd_face:, you can ping me on:

  • here, obviously
  • Twitter (@TDehaene)
  • Github (TDehaene)

Eager to connect with you all :smiley: !

1 Like

Hi folks,

I am Jordy Van Landeghem, AI/NLP researcher at Contract.fit, and industrial Ph.D. student at the Catholic University Leuven (Belgium).

My research is focused on Document Understanding tasks (text classification, NER, structured prediction) and how to calibrate predictions (prediction probability ~= correctness), so that we can better rely on finetuned models out-of-the-box.

Always up for a discussion on how to advance the state-of-the-art. You can find me on:

It would be cool if this thread could lead to some collaborations :slight_smile:
I still believe there is a “missing” model in the model hub, namely a Belgian Transformer-based model, which takes into account the implicit language distribution of Dutch, French, and German (to a lesser degree), while still performing well on the lingua franca English.

Cheers!

2 Likes

Hi everybody, great initiative!

I am Pieter Delobelle, a PhD researcher at DTAI lab at the KU Leuven (Belgium). My research is mostly focussed on fairness and bias in NLP, so for example how to quantify and mitigate stereotypes in language models.

One example of this is in the bias analysis (pdf, page 7) of RobBERT, a Dutch state-of-the-art RoBERTA-based LM that we released last year at EMNLP. Internally, we are using this model for processing resumes and vacancies. We are also using RobBERT to monitor tweets in Belgium, so a Belgian model would be very useful! :slight_smile:

If you have any questions, e.g. about RobBERT or fairness in NLP, you can always find me on:

Feel free to contact me, I’m always open for collaborations :slight_smile:

1 Like

Hi everyone :smiley:,

I am Jens-Joris Decorte, currently working as an NLP Research Engineer at TechWolf in the form of a Baekeland PhD in collaboration with the Text-to-Knowledge (T2K) research group of the University of Ghent.

I just recently kicked off the project and my research is initially focused on industrial applications of language models: feasible training processes on niche domain data, interpretability, and making them more structured (e.g. through combining them with knowledge bases). Currently, I’m working on a method to efficiently fine-tune language models on a niche corpus. I’m looking forward to any kind of discussion on NLP, especially on interpretability in NLP.

Contact:

Looking forward to connecting with you all!

Hi all,

Thanks for starting this, @thomasdehaene! This topic almost feels like a virtual Belgium NLP Meetup, so I don’t want to be left out :wink:

I’m Yves Peirsman, NLPer with a PhD from the University of Leuven and a keen interest in all things language and technology. My company NLP Town helps organizations implement NLP solutions — through consultancy, software development, or a combination of both. We’ve also developed our own labelling tool that helps annotators label text data more effectively.

In the last five years, we’ve worked with many companies, big and small, in a wide range of sectors: medical, legal, financial, HR, education, etc. As we’re based in Belgium, Dutch is one of the languages we work on most, together with many other Western European languages.

Finally, I’m also the organizer of the Belgium NLP Meetups, which I hope will resume after summer.

I’m always open to discussing anything NLP-related, so feel free to contact me through one of these channels:

1 Like

Hi all!

I’m Niels, I’ve studied business and information systems engineering (Handelsingenieur Beleidsinformatica) at KU Leuven, but as I got to know Natural Language Processing during my master and wanted to dive more deeply into technical programming, I’m currently working as an Applied AI Researcher at Howest, where I work on several VLAIO-supported AI projects.

You might have seen the TAPAS algorithm appearing on the social media of HuggingFace - I was the contributor of that model :slight_smile: last year I challenged myself, would it be possible to re-implement an algorithm myself? So that’s what I did, I started reading the original TAPAS repo from Google AI (which was written in TF1), and then slowly but surely started implementing it in PyTorch. After a while, I considered my implementation well enough, and then opened up a pull request on the Transformers repo. After a few more weeks - while working closely together with some of the top developers at HuggingFace - my pull request got merged! Google sent me a small surprise package to thank me for the achievement. It’s definitely something I recommend to anyone, it’s a great learning experience (not only NLP-related, also regarding writing qualitative code and so on, formatting code, etc.)! Also, this week, TAPAS is featured on the homepage of HuggingFace’s model hub - really cool!

Also, I’m now part of the core contributors team of HuggingFace :slight_smile: after TAPAS, I helped improving other models, such as Microsoft’s LayoutLM (which you can use to classify scanned documents or extract information from them), and I’m currently working on adding several other models to the library. My main interests are just getting to know state-of-the-art algorithms (mostly Transformers - as they are conquering everything right now), and making them available for anyone to use, which is of course totally in lign with HuggingFace’s mission.

Fun fact - I’ve spoken to Thomas and Jordy before at a job fair in Ghent and Yves at an ethical AI conference in Brussels, and I know TechWolf of course - I was in the same high school as Andreas De Neve and we took a mathematics seminar together - send him my regards Jens-Joris! I guess NLP is a small world :slight_smile: I’m happy to connect to all of you.

My channels:

3 Likes

Welcome Jordy!

Hi Jens-Joris! Thanks for joining in

Hey Yves! Happy to have you here!

Hi Niels, small world indeed :smiley: ! cool to have you here! Congrats on the awesome HF contributions by the way!

Hi Pieter! Cool you could join us :+1:

1 Like

Hi all, great to see some familiar faces around here!

I’m Thomas Winters, a PhD student researching computational humor and creative artificial intelligence at the DTAI research group (KU Leuven, Belgium). I mainly focused on symbolic approaches, but now some more on transformer models as well. More specifically, some Dutch NLP projects I worked on are:

  • More Dutch Twitterbots than I care to admit (although most are listed here).
  • Pieter and I created RobBERT, the state-of-the-art Dutch BERT model.
  • We used this RobBERT model to show that BERT models are drastically better at humor detection than previous types of language models. By generating “broken” jokes with the same structure & vocabulary as jokes, we showed that while other language models like LSTM and CNN could not distinguish both types at all (less accuracy than random guessing), the RobBERT model still got ~90% accuracy.
  • Helped built technologies for Improbotics Flanders, a show where we play improv theatre with a GPT-2-powered robot.
  • Designed Gitta, a template-powered grammar induction algorithm for creating interpretable generative models.
  • Right now, doing research on combining neural and symbolic methods for NLP.

More info & contact:

1 Like

Hi everyone,

I’m Karel D’Oosterlinck, a computer science engineering student at Ghent University, currently in my final year.

I had my first experience with NLP back in 2018, when I was working on a Twitter sentimental analysis platform for my bachelor’s thesis. Looking back, I’m very glad I had this first experience with NLP.

Currently, I’m doing a master’s thesis on using multilingual models for low-resource languages (at the T2K research group). Specifically, I’m exploring the power of multilingual models for Dutch coreference resolution and how multilingual models can leverage high-resource coreference data to finetune on a low-resource coreference task. I would also like to thank @thomaswint and @pdelobelle for building RobBERT, I’ve already had a lot of fun with this model :wink:.

I’m looking forward to further specialize in NLP after my master’s thesis. Maybe, if there is enough enthusiasm, we could organize a monthly (or every 2 months) event to discuss (Dutch) NLP?

My endpoints:

Hi everyone,

I’m Nithin, currently working as a Machine Learning Engineer at an Amsterdam-based startup called Amberscript. Last year, I graduated with a masters in AI from the University of Amsterdam.

My level of Dutch is elementary, but I’m working on improving Dutch ASR as part of my job. During my masters, I developed a meta-learning based approach to few-shot word sense disambiguation, where the goal is to learn to disambiguate new words with just a handful of examples. Furthermore, I worked on continual learning and showed that meta-learning methods mitigate catastrophic forgetting and result in an efficient form of continual learning. I’m interested in diverse topics in NLP, and looking to explore more on end-to-end ASR systems.

You can find me here:

Hi all,

Thought I’d revive this thread!

I’m Sofie and I’ve been working in NLP since my master’s Thesis back in 2006-2007 at UGent. Since, I’ve done a PhD in BioNLP, worked for some bigger and smaller companies, and am now Lead ML Engineer of the open-source NLP library spaCy.

While Dutch is my native language, I’ve mainly been working on English use-cases in the past, but am definitely interested in Dutch applications and making sure spaCy supports Dutch well.

You can find me here:

I hope we’ll soon get to do IRL meetups again!

1 Like

Hi everyone,

Frederik Durant is my name, fdurant my Hugging Face handle. I’m originally from Brussels, and Dutch is my mother tongue. Hence the interest.

I’ve been dabbling and later working in (and occasionally out of) NLP since 1990 - yes that’s not a typo. Having been around for so long, I’ve witnessed the evolution from rule-based to statistical to neural approaches in industry first hand. It’s amazing how the field has evolved from a highly specialized academic discipline to an installable library where you can get crazy stuff done in 10 lines of code.

My interested has always been in building applications based on (among other things) NLP, rather than on the core discipline itself. Consider me a friendly integrator.

Professionally, I currently work as a freelance ML engineer on chatbots in the banking sector, but I can’t say much about that for confidentiality reasons.

I’m here to learn first, and also to contribute as soon as I can.

Feel free to contact me via LinkedIn.

So cool to see these familiar (virtual) faces over here!! @jordyvl @pdelobelle @nlptown @thomaswint @KarelDO @Sofie

You guessed it, my name is Bram. Did masters in computational linguistics and one in AI (both KU Leuven), and received my PhD this year (Ghent University). I’ve mostly focused on the intersection of computational/psycho- linguistics, human/machine translation, and broader NLP. For the time being I am a post-doc at Ghent University with a focus on human and machine translation.

It is an obvious suggestion that I use transformers in work on MT. However, I have also been using and training models for very niche tasks, e.g. humor detection. Most recently I employed RoBERTa to predict for tokens how difficult they would be to translated. “Difficulty” here is operationalized as normalized translation duration, and is heavily inspired by psycholinguistic studies. I’ve used Transformer models to train my own spaCy-backed Universal Dependencies models, too - which is not necessary anymore as they are provided out of the box these days. Awesome! Oh, and I am a big Tolkien-nerd so I’ve done some experiments with training a GPT-J-like language model on all the collected work of Tolkien :clown_face:

Been using transformers since the pytorch_pretrained_bert days and I’ve been trying my best to contribute and help when I can ever since! (In other NLP-related tools as well.)

I think I’m already in touch with most of you, but feel free to shoot me a message or connect on other platforms!

Good afternoon NLP bazen, I am Bram as well, but from the UMC in Utrecht ;). We are mostly working on/interested in clinical language modeling, from NER/linking to extraction of diagnoses from EHR’s.

You can find our git here: UMC Utrecht · GitHub

This might be of interest to the Dutch/Belgian NLP folks: we’re hosting the Hugging Face webinar for ML Demo.cratization in Belgium tomorrow (30/6/2022). It is an online event that everyone can attend online via Teams.

You can find more information on the website, and attendance links in the second paragraph.