Hi guys, when I participated Kaggle’s multi-language toxic-classification competition several months ago which requires multi-language classfier like XLM-R. In order to boost a performance, I had a simple idea to try different architectures (e.g. GPT-2, Albert, etc.) with multi-language property.
This can be done by make a simple Class wrapper of :
- extract embeded output from XLM-R
- send it to the body part of the other architecture
- Finetune it by freezing the embedding layer.
This Kaggle notebook illustrates how to make XLM-GPT2 in Tensorflow-Keras
https://www.kaggle.com/ratthachat/jigsaw-gpt2-with-xlm-r-embedding
(XLM-Albert-large achieve much better results than XLM-GPT2, but the above notebook is much cleaner so I share this one)
This notebook is about several months ago, and use a bit dated versions of TF and Transformers, so if you use the latest version, you may need to modify the code a bit. ( I have no time to check, but want to share anyway and hopefully it can be useful to some of us)
Please see Version 12 for the acutal running