XLM classification non pre trained language

ErlingAmundsen · April 24, 2023, 5:26pm

Hi,

I am currently working on a Classification problem using xlm-mlm-17-1280 where we want to compare the results on a language the model is not pre-trained on. However I am running into issues with the tokenisation as the gold label dataset for the non pre-trained dataset follows a word by word labeling (A sentence like ‘He isn’t running fast’ is labeled by each word ‘He, isn’t running fast’ ). The XLM does not tokenize to the same, what would be a good way to go about this?

Topic		Replies	Views
Predicting with Token Classifier on data with no gold labels Beginners	1	1431	August 20, 2021
XLM-Roberta for many-topic classification Beginners	1	1165	December 31, 2021
Finding the language specific tokens from XLMR Models	0	183	November 7, 2022
Training from scratch without any pre-trained MLM model Models	0	289	August 16, 2023
Pretrained XLM model with TLM objective generates nonsensical predictions Models	0	533	June 15, 2021

XLM classification non pre trained language

Related topics