Simple trick to make any architectures handle multiple languages - XLM-X

Hi guys, when I participated Kaggle’s multi-language toxic-classification competition several months ago which requires multi-language classfier like XLM-R. In order to boost a performance, I had a simple idea to try different architectures (e.g. GPT-2, Albert, etc.) with multi-language property.

This can be done by make a simple Class wrapper of :

  1. extract embeded output from XLM-R
  2. send it to the body part of the other architecture
  3. Finetune it by freezing the embedding layer.

This Kaggle notebook illustrates how to make XLM-GPT2 in Tensorflow-Keras
https://www.kaggle.com/ratthachat/jigsaw-gpt2-with-xlm-r-embedding

(XLM-Albert-large achieve much better results than XLM-GPT2, but the above notebook is much cleaner so I share this one)

This notebook is about several months ago, and use a bit dated versions of TF and Transformers, so if you use the latest version, you may need to modify the code a bit. ( I have no time to check, but want to share anyway and hopefully it can be useful to some of us)

Please see Version 12 for the acutal running :slight_smile:

1 Like