Hi all, I am trying to run ray tune for my masked language model, I want to find the best hyperparameters that will minimize perplexity of the model. I am not able to figure out how to calculate perplexity using the model’s hidden_states, which is returned as EvalPrediction.predictions
. Any help will be greatly appreciated. Thank you!
following code snippet show the training.
model_checkpoint = "distilroberta-base"
model = AutoModelForMaskedLM.from_pretrained(model_checkpoint).to('cuda')
def compute_custom_metric(eval_pred):
# following will print (3387, 32, 50265) (beach_size * max_output_len * vocal_size)
print(eval_pred.predictions.shape)
# following will print (3387, 32) (batch_size * max_output_len)
print(eval_pred.label_ids.shape)
return {'custom_metric': 0}
trainer = Trainer(
model = model,
args = training_args,
train_dataset = train,
eval_dataset = validation,
tokenizer = tokenizer,
data_collator = data_collator,
compute_metrics = compute_custom_metric
)
trainer.evaluate()