Unclear documentation using CLIP on Sagemaker for inference

According to the snippet of code on clip model page, this is how I can deploy and run inference using sagemaker (btw I’m new to aws from gcp and discovered this awful and very complex UI of aws XD)

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'openai/clip-vit-large-patch14',
	'HF_TASK':'feature-extraction'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role="sm", 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

# predictor.predict({
# 	'inputs': No input example has been defined for this model task.
# })

I don’t understand. Why would I deploy a model and run inference on the same code? It’s different steps for me.

I want to deploy once and make queries later.

Is there a way to get a predictor object without deploying somehow, from an existing deployment? Very confusing

Right now I was just trying to do a pure requests based query but was trying this

import boto3
import numpy as np
import requests
from PIL import Image
# how to install PIL ?
# pip install pillow
client = boto3.client('sagemaker-runtime')

custom_attributes = "c000b4f9-df62-4c85-a0bf-7c525f9104a4"  # An example of a trace ID.
endpoint_name = "huggingface-pytorch-inference-2023-03-18-13-33-18-657"
content_type = "application/json"                           # The MIME type of the input data in the request.
accept = "application/json"                                 # The desired MIME type of the inference in the response.
# payload = "the dog is cute"                                 # The input data to send to the endpoint for inference.
url = "http://images.cocodataset.org/val2017/000000039769.jpg"

image = Image.open(requests.get(url, stream=True).raw)
image_array = np.array(image)

data = {
  "inputs": "the mesmerizing performances of the leads keep the film grounded and keep the audience riveted .",
  "pixel_values": image_array.tolist() 
}
import json

response = client.invoke_endpoint(
    EndpointName=endpoint_name, 
    CustomAttributes=custom_attributes, 
    ContentType=content_type,
    Accept=accept,
    Body=bytes(json.dumps(data, default=str).encode())

)

print(response)
{
	"name": "ModelError",
	"message": "An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message \"{\n  \"code\": 400,\n  \"type\": \"InternalServerException\",\n  \"message\": \"You have to specify pixel_values\"\n}\n\". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2023-03-18-13-33-18-657 in account 580378397133 for more information.",
	"stack": "..."
}

I tried to look a config.json of the model or even dive into transformers code but couldn’t find what is the API of this model when deployed?

Thanks for any help

PS: if you have any tip to where to look in general in model hub code/config to know what is the api, always found it difficult to reverse engineer

Here’s an article that could be helpful: Set Up and Run OpenAI's CLIP on Amazon SageMaker for Inference

the relevant code snippet:

import requests
from PIL import Image
import numpy as np
import json
from sagemaker.predictor import Predictor

endpoint_name = "huggingface-pytorch-inference-2023-03-18-13-33-18-657"  # Existing endpoint
clip_predictor = Predictor(endpoint_name)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"

image = Image.open(requests.get(url, stream=True).raw)
image_array = np.array(image)

data = {
  "inputs": "the mesmerizing performances of the leads keep the film grounded and keep the audience riveted.",
  "pixel_values": image_array.tolist()
}

response = clip_predictor.predict(json.dumps(data))
print(response)