Hi everyone,
I’m running into an issue while evaluating my language model using the compute_metrics
function. During evaluation, I see -100
values in the predictions
tensor, which is leading to an error:
IndexError: piece id is out of range.
After investigation, I realized these -100
values are causing the tokenizer to fail during batch_decode
, as shown in this part of the code:
n_labels = labels.shape[1]
prompt = predictions[:, :-n_labels]
output = predictions[:, -n_labels:]
decoded_prompts = self.tokenizer.batch_decode(prompt, skip_special_tokens=True)
decoded_outputs = self.tokenizer.batch_decode(output, skip_special_tokens=True)
To work around this, I replaced -100
with the padding token ID like this:
if np.any(predictions == -100):
predictions = np.where(predictions == -100, self.tokenizer.pad_token_id, predictions)
This partially solves the error (the decoded output is gibberish english), and I’m still puzzled as to why -100
values are present in my predictions, given that these values are typically used for labels masking. To be clear, here’s the relevant part of my evaluation loop that leads to this:
loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
preds_host = logits if preds_host is None else nested_concat(preds_host, logits, padding_index=-100)
And here is how many -100
values I see per token location in the preds_host
tensor:
sum(preds_host == -100)
# output
tensor([ ... 0, 16, 32, 64, 80, ..., 284, 284, 284, 284], device='cuda:1')
sum(preds_host == -100).shape
torch.Size([699])
It looks like different samples in the batch have varying numbers of -100
values in the later parts of the token vector.
My main question: Why would -100
values be showing up in the predictions
tensor during evaluation when they’re typically used for label masking? How can I address this more cleanly?
Thanks for any insights!