[text classification] different result format for inference API and inference endpoint

Hi all!

I am trying to deploy an inference endpoint for the DeBERTa Base MNLI model. While the endpoint deploys successfully, I am finding that the result format is different the free inference API as opposed to querying the inference endpoint. See the below results:

Inference endpoint:

>>> requests.post("<MY INFERENCE ENDPOINT>", headers={"Authorization": f"Bearer <HF API KEY>"}, json={"inputs": "[CLS] I love you. [SEP] I like you. [SEP]"}).json()
[{'label': 'ENTAILMENT', 'score': 0.9248048663139343}]

Inference API:

>>> requests.post("https://api-inference.huggingface.co/models/microsoft/deberta-base-mnli", headers={"Authorization": f"Bearer <HF API KEY>"}, json={"inputs": "[CLS] I love you. [SEP] I like you. [SEP]"}).json()
[[{'label': 'ENTAILMENT', 'score': 0.9248047471046448}, {'label': 'NEUTRAL', 'score': 0.07485755532979965}, {'label': 'CONTRADICTION', 'score': 0.00033764145337045193}]]

Note that through the inference API, I receive ENTAILMENT, NEUTRAL, and CONTRADICTION scores, where as with the inference endpoint, I only receive ENTAILMENT. For my application, I need all three scores.

It’s not immediately clear to me what’s going on here. When creating the inference endpoint, I did so by clicking the deploy dropdown on the deberta-base-mnli page, so I would expect for the deployed model to yield the same result format. I am able to replicate this when creating both CPU and GPU inference endpoints for the deberta-base-mnli model and when creating an inference endpoint for the deberta-large-mnli model.

Thanks in advance for the help!