Hosted inference ignores attention mask resulting in wrong predictions

andreas122001 · May 5, 2023, 9:56am

I recently uploaded my model to the huggingface hub (see andreas122001/bloomz-560m-wiki-detector · Hugging Face), a fine-tuned bloomz-560m for binary text classification. Using the trainer-api, I got some good metrics (>95% acc and precision), so the training seems to have worked fine.

However. when testing the hosted inference on the model page, I am not getting the predictions that I am expecting, and it seems it is predicting the same label every time. When testing the model locally, I found that when not including the attention-mask, (e.g. model(encoding['input_ids'])), I get the same results as with the hosted inference, but when including the attention-mask (e.g. model(**encoding)), I get the expected result. I suspect the problem is that the hosted inference is not including the attention mask from the tokenized data?

Here is the code for the local testing:

def predict(data): 
    # Encode input text and labels
    encoding = tokenizer(data, return_tensors="pt", padding="max_length", truncation=True)
    encoding = {k: v.to(model.device) for k, v in encoding.items()}

    # Execution
    with torch.no_grad():
        outputs = model(**encoding) # model(encoding['input_ids']) gives wrong predictions
        logits = outputs.logits.squeeze()

    # Calculate probabilities
    probabilities = torch.softmax(logits.cpu(), dim=-1).detach().numpy()
    return probabilities.tolist()

So, I guess there is something wrong in the configuration of the model, e.g. the config.json or tokenizer_config.json? Is there a way to force the hosted inference to use an attention-mask?

Topic		Replies	Views
Bug Report: Mask token mismatch with the model on hosted inference API of Model Hub Beginners	1	551	May 31, 2021
Mask token mismatch with the model on hosted inference API of Model Hub Model cards	1	1966	June 1, 2021
Do automatically generated attention masks ignore padding? 🤗Transformers	4	16481	March 8, 2022
How to do inference with fined-tuned huggingface models? 🤗Transformers	3	814	February 4, 2024
Batched BertForMaskedLM inference loss issue Intermediate	0	690	February 23, 2022

Hosted inference ignores attention mask resulting in wrong predictions

Related topics