Masked Language Model Scoring

david-waterworth · April 16, 2021, 3:56am

Is there an implementation of the Psuedo Log Likelihood for bidirectional language models (i.e. Salazar et al. Masked Language Model Scoring) in transformers? The github repo in the linked paper uses transformers 3.3 and I’ve been unable to get it to work for 4.5.

lewtun · April 16, 2021, 7:54am

what kind of problems are you running into? presumably it’s due to a change in the API, so sharing what steps you’re taking and the error messages will help with the debugging

david-waterworth · April 16, 2021, 8:59am

Do you mean with the GitHub - awslabs/mlm-scoring: Python library & examples for Masked Language Model Scoring (ACL 2020) implementation? I’m assuming there’s not much I can do to try and get a 3rd party library which is specifically designed for transformers 3.3 to work with a transformer / tokeniser trained with version 4.5. Specifically my tokeniser is in the new single json file format and as far as I can see the 3.3 library is trying to load from the legacy format. The main issue is the setup.py of the mlm-scoring library requires ==3.3 rather than >=3.3 so installing it downgrades. I suppose I could try removing the version requirement and see what happens.

But ideally the metric would be available via a library which is more up to date. I’ll probably code it up myself altouhg it wont be overly efficient, you need to compute the MLM objective masking each token in order and then sum the log likelyhoods to compute PLL for a single sentance.

lewtun · April 16, 2021, 9:58am

yes, i was wondering whether you could adapt their code to match the current transformers API.

can you point me to the line of code where this is done? i might be able to suggest a workaround this way

GeethaDG · May 16, 2023, 10:16am

Was this implemented in transformers or was there some solution for this? I am attempting to use this scoring technique in my project. Could you please share some details?

katarinayuan · June 15, 2023, 9:33pm

Hi, do you have some solutions? Could you share some experience?

Topic		Replies	Views
How to correctly evaluate a Masked Language Model? 🤗Transformers	3	4402	August 11, 2023
Accuracy of Masked LM training Beginners	0	1031	June 15, 2022
MLM vs CLM, can be exchanged? Models	0	1054	August 21, 2022
Fine tune Masked Language Model on custom dataset Beginners	5	6064	August 20, 2020
Getting the MLM accuracy for the BERT model I am training from scratch Beginners	7	5359	October 5, 2023

Masked Language Model Scoring

Related topics