Hi to all,
I tried to use the running
run mlm.py to reproduce the result of the bert-base-uncased version. However, I found my reproduced results are always lower than the one reported in this website provided by the Huggingface team.
|Task||Metric||Reported by Huggingface||Our reproduced result|
|MNLI||Matched acc./Mismatched acc.||83.91/84.10||82.34/83.01|
I think there must be some problems with my experiment. I ran my experiment by using:
(1) I used the code in this file without any change.
(2) I loaded the datasets of bookcorpus and wiki directly from
dataset library; the text is chunked into 512 tokens.
(3) I set the batch size as 256 and ran 1M steps; and batch size as 8K and ran 50K steps. Both results are worse than the reported numbers.
I really apprecitate if you could provide me a script that I can use to reproduce BERT or RoBERTa. Thank you very much!