Inference Input for Vision Models

I set up a serverless inference endpoint for YOLOS but I’m failing a bit at using it for prediction.

The example notebook uses a text model and that just passes a string, but for YOLOS (and other vision models) what do I put into the predictor?

I tried a numpy array from an PIL Image but that gave me this Client Error:

/home/johannes/Projects/Vision/object-detection/src/object-detection/sagemaker_serverless.ipynb Cell 12' in <cell line: 2>()
      1 pil_im ="20190601_14005820190911-1-u4n8ej.jpg")
----> 2 hf_predictor.predict(np.asarray(pil_im))

File ~/.local/share/virtualenvs/object-detection-a0LJrJAA/lib/python3.9/site-packages/sagemaker/, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id)
    131 """Return the inference from the specified endpoint.
    133 Args:
    155         as is.
    156 """
    158 request_args = self._create_request_args(
    159     data, initial_args, target_model, target_variant, inference_id
    160 )
--> 161 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    162 return self._handle_response(response)

File ~/.local/share/virtualenvs/object-detection-a0LJrJAA/lib/python3.9/site-packages/botocore/, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    504     raise TypeError(
    505         f"{py_operation_name}() only accepts keyword arguments."
    506     )
    507 # The "self" in this scope is referring to the BaseClient.
--> 508 return self._make_api_call(operation_name, kwargs)
--> 915     raise error_class(parsed_response, operation_name)
    916 else:
    917     return parsed_response

ClientError: An error occurred (413) when calling the InvokeEndpoint operation: 

Hello @johko,

here is an example for image segmentation, which you can adopt for Yolos: notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub

Thank you Philipp, I haven’t found that notebook before. I can run the example for the Segformer in there without a problem, also for serverless.
But if I try to adapt this for YOLOS (serverless or not does not matter), I do get the above error.

Here is the code I’m trying for the YOLOS serverless endpoint:

hub = {

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    env=hub,                      # configuration for loading model from Hub
    role=role,                    # iam role with permissions to create an Endpoint
    transformers_version="4.17",  # transformers version used
    pytorch_version="1.10",        # pytorch version used
    py_version='py38',            # python version used

# Specify MemorySizeInMB and MaxConcurrency in the serverless config object
serverless_config = ServerlessInferenceConfig(
    memory_size_in_mb=4096, max_concurrency=10,

# deploy the endpoint
yolos_predictor = huggingface_model.deploy(


Ah my bad @johko! Yolos got added after 4.17.0 and we are still stuck on the release for the next DLC. What you could do is create a requriements.txt add transformers==4.20.1 to it and deploy the model with a “custom” script. notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub

1 Like

I see, thank you @philschmid .
I’ll try that and if it takes to much time I’ll try DETR, as I think that should already be included in transformers 4.17.0 and should be good enough for my needs

Not sure if DETR will work since it requires timm which is not installed by default. So there would be a requirements.txt needed as well.

I got YOLOS to work with the custom script (mainly following Niels from this notebook: Transformers-Tutorials/YOLOS_minimal_inference_example.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub) and the newer transformer version.

Thank you for the hints @philschmid :hugs:

1 Like