Pretrained XLM model with TLM objective generates nonsensical predictions

cbaziotis · June 15, 2021, 11:19am

Hi, I want to use the xlm-mlm-tlm-xnli15-1024 pretrained model, which is the XLM model trained with the auxiliary Translation Language Modeling (TLM) objective.

I want to give a translation pair to the model, mask some words in one of the sentences and then get the predictions of the model for the masked words. Check the figure for reference.

My problem is that the model makes nonsensical predictions, which probably means that I am doing something wrong. Here is a code snippet:

import torch
from transformers import XLMWithLMHeadModel, XLMTokenizer

model_name = "xlm-mlm-tlm-xnli15-1024"
tokenizer = XLMTokenizer.from_pretrained(model_name)
model = XLMWithLMHeadModel.from_pretrained(model_name)
model.eval()

src_lang_id = tokenizer.lang2id["en"] # English
trg_lang_id = tokenizer.lang2id["el"] # Greek

src_text = "I love pasta with tomato sauce!".replace("tomato", tokenizer.mask_token)
trg_text = "Μου αρέσουν τα ζυμαρικά με σάλτσα ντομάτας!"

print(f"{src_text}->{trg_text}")

# get token_ids
src_input_ids = torch.tensor([tokenizer.encode(src_text)])
trg_input_ids = torch.tensor([tokenizer.encode(trg_text)])

src_len = src_input_ids.shape[1]
trg_len = trg_input_ids.shape[1]

# get lang_ids
src_langs = torch.tensor([src_lang_id] * src_len).view(1, -1)
trg_langs = torch.tensor([trg_lang_id] * trg_len).view(1, -1)

# get token_type_ids
src_type = torch.tensor([0] * src_len).view(1, -1)
trg_type = torch.tensor([1] * trg_len).view(1, -1)

input_ids = torch.cat([src_input_ids, trg_input_ids], dim=1)
token_type_ids = torch.cat([src_type, trg_type], dim=1)
lang_ids = torch.cat([src_langs, trg_langs], dim=1)
position_ids = torch.cat([torch.arange(src_len), torch.arange(trg_len)])

# encode and predict
result = model(input_ids,
               langs=lang_ids,
               position_ids=position_ids.view(1, -1),
               token_type_ids=token_type_ids)

# get predictions for masked token
masked_index = torch.where(input_ids == tokenizer.mask_token_id)[1].tolist()[0]
result = result[0][:, masked_index].topk(5).indices
result = result.tolist()[0]

print(f"Predictions:", tokenizer.decode(result))

Console output:

I love pasta with <special1> sauce!->Μου αρέσουν τα ζυμαρικά με σάλτσα ντομάτας!
Predictions: with the 'i'my

I tried omitting some of the arguments to the model, changing the example sentence-pair and the languages, but I always get weird predictions.

What am I doing wrong?

P.S. I had to downgrade to transformers==2.9.0, because in the newer versions I get this error message:

Some weights of XLMWithLMHeadModel were not initialized from the model checkpoint at xlm-mlm-tlm-xnli15-1024 and are newly initialized: ['transformer.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I also noticed that even in that version, the predictions are the same, which means that there is something else going on.

Topic		Replies	Views
How to train the Translation Language Modeling (TLM) with transformers/examples/language-modeling/run_mlm.py? 🤗Transformers	2	954	June 26, 2021
XLM for translation not working 🤗Transformers	1	199	January 24, 2023
xlm-Roberta for mlm doesn't predict single one trained sentence properly Models	0	218	June 29, 2023
Cant get model jjzha/esco-xlm-roberta-large to run correctly Models	0	197	August 24, 2023
XLM classification non pre trained language Beginners	0	174	April 24, 2023

Pretrained XLM model with TLM objective generates nonsensical predictions

Related topics