How can I check mlm accuracy during training RoBERTa?

HyeyeonKoo · January 13, 2021, 7:12am

Hello.
I try to train RoBERTa from scratch. There are the code and printed log below. From the code, I can check the mlm loss, but I couldn’t find options for mlm accuracy. Is there anything I can do for check mlm acc?

from transformers import RobertaConfig
config = RobertaConfig(
    num_hidden_layers=4,    
    hidden_size=512,    
    hidden_dropout_prob=0.1,
    num_attention_heads=8,
    attention_probs_dropout_prob=0.1,    
    intermediate_size=2048,    
    vocab_size=34492,
    type_vocab_size=1,    
    initializer_range=0.02,
    max_position_embeddings=512,
    position_embedding_type="absolute"
)

from transformers import RobertaTokenizerFast
tokenizer = RobertaTokenizerFast.from_pretrained("tokenizer", max_len=512)

from transformers import RobertaForMaskedLM
model = RobertaForMaskedLM(config=config)

from transformers import LineByLineTextDataset
train_dataset = LineByLineTextDataset(
    tokenizer=tokenizer,
    file_path="train.txt",
    block_size=tokenizer.max_len_single_sentence
)

from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)

from transformers import Trainer, TrainingArguments
num_train_epochs = 4
max_steps = num_train_epochs * len(train_dataset)
warmup_steps = int(max_steps*0.05)

training_args = TrainingArguments(
    output_dir="output",
    overwrite_output_dir=True,
    
    do_train=True,
    max_steps=max_steps,
    warmup_steps=warmup_steps,
    num_train_epochs=num_train_epochs,

    per_device_train_batch_size=100,
    
    learning_rate=5e-5,
    
    weight_decay=0,
    max_grad_norm=1,
    
    adam_beta1=0.9,
    adam_beta2=0.98,
    adam_epsilon=1e-6,
    
#     disable_tqdm=True
    logging_dir="log",
    logging_first_step=True
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
)

trainer.train()

0: {'loss': 10.588345527648926, 'learning_rate': 1.4910684996868758e-09, 'epoch': 0.0011918951132300357}
0: {'loss': 10.444767531507718, 'learning_rate': 7.455342498434379e-07, 'epoch': 0.5959475566150179}
0: {'loss': 9.9342578125, 'learning_rate': 1.4910684996868757e-06, 'epoch': 1.1918951132300357}
0: {'loss': 9.384439453125, 'learning_rate': 2.236602749530314e-06, 'epoch': 1.7878426698450536}
0: {'loss': 8.790998046875, 'learning_rate': 2.9821369993737514e-06, 'epoch': 2.3837902264600714}
0: {'loss': 8.097921875, 'learning_rate': 3.727671249217189e-06, 'epoch': 2.9797377830750893}
0: {'loss': 7.4109140625, 'learning_rate': 4.473205499060628e-06, 'epoch': 3.575685339690107}
0: {'loss': 6.89530859375, 'learning_rate': 5.218739748904065e-06, 'epoch': 4.171632896305125}
0: {'loss': 6.57353515625, 'learning_rate': 5.964273998747503e-06, 'epoch': 4.767580452920143}
0: {'loss': 6.354984375, 'learning_rate': 6.70980824859094e-06, 'epoch': 5.363528009535161}
0: {'loss': 6.194296875, 'learning_rate': 7.455342498434378e-06, 'epoch': 5.959475566150179}
0: {'loss': 6.028484375, 'learning_rate': 8.200876748277817e-06, 'epoch': 6.5554231227651965}
...

sgugger · January 13, 2021, 2:03pm

You have to add two things to check your accuracy. First you should define an evaluation strategy, to regularly evaluate your model on the validation set (in TrainingArguments, add evaluation_strategy="steps" to evaluate every eval_steps steps or evaluation_strategy="epoch" to evaluate every epoch).

Then you need to add a compute_metrics function to your Trainer, see for instance in the run_glue script how one is coded, that should return the accuracy you want.

HyeyeonKoo · January 14, 2021, 2:32am

Thank you for replying. I will try it. Can I ask something more? During the training, I found that there are same training loss regardless of number of GPUs. Code is same as above. I tried to use single, two, four GPUs, but I feel there are no differences. (I expect training more fast, but I couldn’t maintain batch size because of OOM. I think it’s because gathering loss in one GPU. As a result, training speed is decreased.) Moreover, when I use multi-gpu, many warnings are occur. How can I upgrade performance or speed with multi-GPUs?

sgugger · January 14, 2021, 3:38pm

It’s hard to know what the issue is with just comments like this. Seeing which commands your run would allow us to help.

Anne · May 15, 2021, 11:51am

Hi! I am having the same issue. Can you please explain this more… The links you have provided seems not working to me

Anne · May 15, 2021, 11:53am

Hi! I am having the same issue. I also need to find the mlm accuracy. Can you please explain this more… The links you have provided seems not working to me. Thank you in advance

sanjaysingh23 · July 7, 2021, 8:17pm

@sgugger pls help

HyeyeonKoo · August 30, 2021, 4:47am

I solved this problem with tokenizer and model output like below.

    for sequence in sequences:
        sequence = sequence.replace("<mask>", tokenizer.mask_token)

        masked_input = tokenizer.encode(sequence, return_tensors="pt")
        mask_token_index = torch.where(masked_input==tokenizer.mask_token_id)[1]

        token_logits = model(masked_input)[0]
        mask_token_logits = token_logits[0, mask_token_index, :]

        top_n_token = torch.topk(mask_token_logits, top_n, dim=1).indices[0].tolist()
        result.append([tokenizer.decode([token]).strip() for token in top_n_token])

    labels = np.array(labels)
    result = np.array(result)
    acc = np.mean(result[:, 0] == labels)

I refer to huggingface documentation.
I hope that this is helpful to @ Anne, @ sanjaysingh23.

And I think there is a better way to solve this problem with parallel, instead of iteration. Anyone with ideas please help me.

Topic		Replies	Views
Getting the MLM accuracy for the BERT model I am training from scratch Beginners	7	5211	October 5, 2023
Accuracy of Masked LM training Beginners	0	952	June 15, 2022
Opinion: Training Argument Fine Tuning MLM RoBERTa Intermediate	1	25	January 9, 2025
How loss is calculated in MLM training 🤗Transformers	0	834	April 1, 2022
Pre-trained from scratch RoBERTa is not fine-tuned. (using pytorch and DDP) Beginners	1	360	September 24, 2024

How can I check mlm accuracy during training RoBERTa?

Related topics