How can I check mlm accuracy during training RoBERTa?

Hello.
I try to train RoBERTa from scratch. There are the code and printed log below. From the code, I can check the mlm loss, but I couldn’t find options for mlm accuracy. Is there anything I can do for check mlm acc?

from transformers import RobertaConfig
config = RobertaConfig(
    num_hidden_layers=4,    
    hidden_size=512,    
    hidden_dropout_prob=0.1,
    num_attention_heads=8,
    attention_probs_dropout_prob=0.1,    
    intermediate_size=2048,    
    vocab_size=34492,
    type_vocab_size=1,    
    initializer_range=0.02,
    max_position_embeddings=512,
    position_embedding_type="absolute"
)

from transformers import RobertaTokenizerFast
tokenizer = RobertaTokenizerFast.from_pretrained("tokenizer", max_len=512)

from transformers import RobertaForMaskedLM
model = RobertaForMaskedLM(config=config)

from transformers import LineByLineTextDataset
train_dataset = LineByLineTextDataset(
    tokenizer=tokenizer,
    file_path="train.txt",
    block_size=tokenizer.max_len_single_sentence
)

from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)

from transformers import Trainer, TrainingArguments
num_train_epochs = 4
max_steps = num_train_epochs * len(train_dataset)
warmup_steps = int(max_steps*0.05)

training_args = TrainingArguments(
    output_dir="output",
    overwrite_output_dir=True,
    
    do_train=True,
    max_steps=max_steps,
    warmup_steps=warmup_steps,
    num_train_epochs=num_train_epochs,

    per_device_train_batch_size=100,
    
    learning_rate=5e-5,
    
    weight_decay=0,
    max_grad_norm=1,
    
    adam_beta1=0.9,
    adam_beta2=0.98,
    adam_epsilon=1e-6,
    
#     disable_tqdm=True
    logging_dir="log",
    logging_first_step=True
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
)

trainer.train()
0: {'loss': 10.588345527648926, 'learning_rate': 1.4910684996868758e-09, 'epoch': 0.0011918951132300357}
0: {'loss': 10.444767531507718, 'learning_rate': 7.455342498434379e-07, 'epoch': 0.5959475566150179}
0: {'loss': 9.9342578125, 'learning_rate': 1.4910684996868757e-06, 'epoch': 1.1918951132300357}
0: {'loss': 9.384439453125, 'learning_rate': 2.236602749530314e-06, 'epoch': 1.7878426698450536}
0: {'loss': 8.790998046875, 'learning_rate': 2.9821369993737514e-06, 'epoch': 2.3837902264600714}
0: {'loss': 8.097921875, 'learning_rate': 3.727671249217189e-06, 'epoch': 2.9797377830750893}
0: {'loss': 7.4109140625, 'learning_rate': 4.473205499060628e-06, 'epoch': 3.575685339690107}
0: {'loss': 6.89530859375, 'learning_rate': 5.218739748904065e-06, 'epoch': 4.171632896305125}
0: {'loss': 6.57353515625, 'learning_rate': 5.964273998747503e-06, 'epoch': 4.767580452920143}
0: {'loss': 6.354984375, 'learning_rate': 6.70980824859094e-06, 'epoch': 5.363528009535161}
0: {'loss': 6.194296875, 'learning_rate': 7.455342498434378e-06, 'epoch': 5.959475566150179}
0: {'loss': 6.028484375, 'learning_rate': 8.200876748277817e-06, 'epoch': 6.5554231227651965}
...

You have to add two things to check your accuracy. First you should define an evaluation strategy, to regularly evaluate your model on the validation set (in TrainingArguments, add evaluation_strategy="steps" to evaluate every eval_steps steps or evaluation_strategy="epoch" to evaluate every epoch).

Then you need to add a compute_metrics function to your Trainer, see for instance in the run_glue script how one is coded, that should return the accuracy you want.

2 Likes

Thank you for replying. I will try it. Can I ask something more? During the training, I found that there are same training loss regardless of number of GPUs. Code is same as above. I tried to use single, two, four GPUs, but I feel there are no differences. (I expect training more fast, but I couldn’t maintain batch size because of OOM. I think it’s because gathering loss in one GPU. As a result, training speed is decreased.) Moreover, when I use multi-gpu, many warnings are occur. How can I upgrade performance or speed with multi-GPUs?

It’s hard to know what the issue is with just comments like this. Seeing which commands your run would allow us to help.

Hi! I am having the same issue. Can you please explain this more… The links you have provided seems not working to me

Hi! I am having the same issue. I also need to find the mlm accuracy. Can you please explain this more… The links you have provided seems not working to me. Thank you in advance

1 Like

@sgugger pls help

I solved this problem with tokenizer and model output like below.

    for sequence in sequences:
        sequence = sequence.replace("<mask>", tokenizer.mask_token)

        masked_input = tokenizer.encode(sequence, return_tensors="pt")
        mask_token_index = torch.where(masked_input==tokenizer.mask_token_id)[1]

        token_logits = model(masked_input)[0]
        mask_token_logits = token_logits[0, mask_token_index, :]

        top_n_token = torch.topk(mask_token_logits, top_n, dim=1).indices[0].tolist()
        result.append([tokenizer.decode([token]).strip() for token in top_n_token])

    labels = np.array(labels)
    result = np.array(result)
    acc = np.mean(result[:, 0] == labels)

I refer to huggingface documentation.
I hope that this is helpful to @ Anne, @ sanjaysingh23.

And I think there is a better way to solve this problem with parallel, instead of iteration. Anyone with ideas please help me.