Hosted inference ignores attention mask resulting in wrong predictions

I recently uploaded my model to the huggingface hub (see andreas122001/bloomz-560m-wiki-detector · Hugging Face), a fine-tuned bloomz-560m for binary text classification. Using the trainer-api, I got some good metrics (>95% acc and precision), so the training seems to have worked fine.

However. when testing the hosted inference on the model page, I am not getting the predictions that I am expecting, and it seems it is predicting the same label every time. When testing the model locally, I found that when not including the attention-mask, (e.g. model(encoding['input_ids'])), I get the same results as with the hosted inference, but when including the attention-mask (e.g. model(**encoding)), I get the expected result. I suspect the problem is that the hosted inference is not including the attention mask from the tokenized data?

Here is the code for the local testing:

def predict(data): 
    # Encode input text and labels
    encoding = tokenizer(data, return_tensors="pt", padding="max_length", truncation=True)
    encoding = {k: v.to(model.device) for k, v in encoding.items()}

    # Execution
    with torch.no_grad():
        outputs = model(**encoding) # model(encoding['input_ids']) gives wrong predictions
        logits = outputs.logits.squeeze()

    # Calculate probabilities
    probabilities = torch.softmax(logits.cpu(), dim=-1).detach().numpy()
    return probabilities.tolist()

So, I guess there is something wrong in the configuration of the model, e.g. the config.json or tokenizer_config.json? Is there a way to force the hosted inference to use an attention-mask?

1 Like