Simple trick to make any architectures handle multiple languages - XLM-X

Jung · November 13, 2020, 4:26am

Hi guys, when I participated Kaggle’s multi-language toxic-classification competition several months ago which requires multi-language classfier like XLM-R. In order to boost a performance, I had a simple idea to try different architectures (e.g. GPT-2, Albert, etc.) with multi-language property.

This can be done by make a simple Class wrapper of :

extract embeded output from XLM-R
send it to the body part of the other architecture
Finetune it by freezing the embedding layer.

This Kaggle notebook illustrates how to make XLM-GPT2 in Tensorflow-Keras
https://www.kaggle.com/ratthachat/jigsaw-gpt2-with-xlm-r-embedding

(XLM-Albert-large achieve much better results than XLM-GPT2, but the above notebook is much cleaner so I share this one)

This notebook is about several months ago, and use a bit dated versions of TF and Transformers, so if you use the latest version, you may need to modify the code a bit. ( I have no time to check, but want to share anyway and hopefully it can be useful to some of us)

Please see Version 12 for the acutal running

Topic		Replies	Views
About Generative task Beginners	0	205	March 28, 2023
GPT2 with TensorFlow? 🤗Transformers	1	371	November 14, 2020
XLM-Roberta for many-topic classification Beginners	1	1165	December 31, 2021
XLM-R classifier predictions produce errors Beginners	2	662	June 25, 2021
Multilingual Finetuning XLS-R 🤗Transformers	1	388	January 11, 2022

Simple trick to make any architectures handle multiple languages - XLM-X

Related topics