Post format for using Phi-3 via the Serverless API

Hi all,

I’m struggling to make post requests on Phi-3-mini-4k-instruct via the Serverless API. My current best attempt Python code follows:

import json
import requests
serverless_api_token = "hf_XXXX..." 
model_endpoint = "https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-4k-instruct"
headers = {"Authorization": f"Bearer {serverless_api_token}", "Content-Type": "application/json",}
json_body = {"inputs": "<|system|> You are a helpful AI assistant.<|end|><|user|>What is the biggest public financial institution in Australia?<|end|><|assistant|>",
             "parameters": {"max_length": 30}}
response = requests.post(model_endpoint, headers=headers, json=json_body)

The response status code is 424 with message "error":"Request failed during generation: Server error: CANCELLED","error_type":"generation". If I just put in the text without the Phi-3 formatting, eg:

json_body = {"inputs": "You are a helpful AI assistant. What is the biggest public financial institution in Australia?",
             "parameters": {"max_length": 30}}

then I get HTTP code 200, so I’m thinking all is good, but response.content just contains:

b'[{"generated_text":"You are a helpful AI assistant. What is the biggest public financial institution in Australia?\\n"}]'

so it appears to have just fed me back the prompt with a newline character at the end.

EDIT: getting weirder. I tried:

json_body = {"inputs": "The biggest public financial institution in Australia is "}

and this works (status code 200) but response.content is:

b'[{"generated_text":"The biggest public financial institution in Australia is ........................\\n"}]'

I should add, I also have this model stored locally, and can run it locally on my laptop, and it generates perfectly sensible responses to this prompt, so I’m not sure why I’m getting such strange outcomes on the serverless API.

EDIT 2: If I switch the model to meta-llama/Meta-Llama-3-8B-Instruct then everything works as expected. So this appears to be something specific with Phi-3.

Cheers and thanks in advance to anyone who can help.

Colin