Sagemaker Serverless Inference for LayoutLMv2 model

Hi everyone,

I am experimenting with recently released Sagemaker Serverless inference thanks to Julien Simon’s tutorial

Following it I managed to train a custom DistillBERT model locally, upload to S3 and create a Serverless checkpoint that works.

Right now I am pushing it further by trying it with LayoutLMv2 model.

However, it is not clear to me how to pass inputs to it. For example in DistillBert I just create input like this test_data_16 = {'inputs': 'Amazing!'} and pass it as JSON in invoke_endpoint function.

In LayoutLMv2 input consists of three parts: image, text and bounding boxes. What keys do I use to pass them ? Here is the link to the call of the processor

Second question is: It is not clear to me how to make modifications to the default settings of processor when creating the endpoint. For example, I would like to set the flag only_label_first_subword True by default in the processor. How to do that?

Thanks!

1 Like

Hi Elman, thanks for opening this thread, this is a super interesting topic :slight_smile:

No matter what model you deploy to a Sagemaker (SM) endpoint, the input always requires preprocessing before it can be passed to the model. The reason you can just pass some text in the case of DistilBert without having to do the processing yourself is that the SageMaker Hugging Face Inference Toolkit does all the work for you. This toolkit on builds on top of the Pipeline API, which is what makes it so easy to call.

What does that mean for you when you want to use a LayoutLMV2 model? I see two possibilities:

  1. The Pipeline API offers a class for Object Detection: Pipelines. I’m not familiar with it but I would imagine that it is quite straightforward to use. Again, because the Inference Toolkit is based on Pipelines, once you figure out how to use the Pipeline API for Object Detection you can use the same call for the SM Endpoint

  2. The Inference Toolkit also allows you to provide your own preprocessing script, see more details here: Deploy models to Amazon SageMaker. That means you can process the inputs yourself before passing it to the model. What I would do (because I’m lazy) is to just look at an already existing demo to see how the preprocessing for a LayoutLMV2 model works. For example this one: app.py · nielsr/LayoutLMv2-FUNSD at main, and use that.

Hope this helps, please let me know how it goes and/or reach out if any questions.

Cheers
Heiko

2 Likes

One caveat I forgot to mention: At the moment it seems that deploying a model >512MB to a serverless endpoint can lead to an error. Fortunately there seems to be a workaround: Sagemaker Serverless Inference - #7 by philschmid

Just something to be aware of!

Cheers
Heiko

2 Likes

Thanks a lot Heiko,

Let me look into these two possibilities deeper and get back to you.

Regarding deploying model >512MB by setting MMS_DEFAULT_WORKERS_PER_MODEL=1 do you have rough numbers on how it affects the latency?

Thanks again!

Heiko @marshmellow77 upon closer inspection using Inference Toolkit suits me better.

However the documentation is very scarce and there are no examples I found implementing all these functions like model_fn and etc for custom model. Do you have link to examples?

And is there a way to locally debug them to avoid repeatedly creating endpoints for debugging?

Thanks again

Re MMS_DEFAULT_WORKERS_PER_MODEL=1 and latency, unfortunately I don’t have any data on that.

I have an example for an inference.py script here: text-summarisation-project/inference.py at main · marshmellow77/text-summarisation-project · GitHub, I hope this is useful.

Now, re debugging - agreed that it would be ideal to locally debug the deployed inference code. I know that Sagemaker offers “local mode”, which (in theory) should allow you to do exactly that: Use the Amazon SageMaker local mode to train on your notebook instance | AWS Machine Learning Blog

So far the pain wasn’t high enough to try it myself, I always got by with print statements in the inference code so far. But in case you want to give the local mode a try I’d be curious to learn whether it works well for you :slight_smile:

Hey @mansimov MMS_DEFAULT_WORKERS_PER_MODEL=1 should not decrease performance at all. That config is basically limiting the number of HTTP Workers, which can be started and since AWS Lambda can handle 1 requests at a time anyways its not a problem.

1 Like

Hi @philschmid and @marshmellow77

Thanks a lot for information comments!

I gave a first try on Serverless LayoutLMv2. On the first try I managed to create an endpoint successfully, however after invoking it I saw an error that detectron2 which LayoutLMv2 depends on was not found.

So I tried re-creating the checkpoint by adding git+https://github.com/facebookresearch/detectron2.git in the requirements. However this time the endpoint creation failed due to timeout.

Failure reason
Unable to successfully stand up your model within the allotted 180 second timeout. 
Please ensure that downloading your model artifacts, starting your model container and passing the ping health checks can be completed within 180 seconds.

I looked at the logs and the latest messages I saw were regarding the detectron2 installation. It is very likely that the endpoint creation is timing out due to additional time taking to install it, but I could be wrong.

I digged a bit into documentation to see if I can increase the delay. I added the WaiterConfig flags to the waiter wait

waiter.wait(EndpointName=huggingface_endpoint_name,
           WaiterConfig={
               'Delay': 480,
               'MaxAttempts': 120
           })

However still getting the same error.

I am not sure if I am setting up these delay flags correctly, so I would appreciate your help in pointing out how to potentially resolve this timeout issue. Thanks!

Hello @mansimov,

Are there more logs available in cloudwatch? The error message you see with 180 seconds means that sagemaker couldn’t create and test your endpoint in 180 seconds, which is the max time for SageMaker.

Which detectron2 version are you installing?

Hi @philschmid

I tried playing more with version of detectron2 I was using in the requirements.txt . Turned out changing git+https://github.com/facebookresearch/detectron2.git to https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.9/detectron2-0.6%2Bcpu-cp38-cp38-linux_x86_64.whl did the trick and endpoint was created successfully. Perhaps building detectron2 from scratch took too much time and using pre-build wheel library helped.

Now after invoking updated checkpoint I saw a message of new dependency missing

 "message": "Failed to import transformers.models.layoutlmv2.modeling_layoutlmv2 because of the following error (look up to see its traceback):\nNo module named \u0027torchvision\u0027"

I was a bit surprised since I did not see torchvision being used in layoutlmv2 but anyway added torchvision==0.10.0 to the requirements.txt

Now the endpoint creation failed again :smiley: now due to space not being available during torchvision install according to the CloudWatch logs

ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

I was using the 6144 MB memory size fyi.

Considering it is quite a hassle and takes quite some time installing all these dependencies every time endpoint is created, I wonder if it is just easier to create a new inference container extending existing Huggingface Inference container with additional dependencies & upgraded packages.

Can you point out how to do so?

Thanks!

Ok I see the hugging face cpu inference container here https://github.com/aws/deep-learning-containers/blob/master/huggingface/pytorch/inference/docker/1.9/py3/Dockerfile.cpu

with general overview on how to extend it and test the container https://github.com/aws/deep-learning-containers#building-your-image

let me see what I can do myself

@philschmid I managed to install the personal HuggingFace inference container using your latest pytorch 1.10 commit https://github.com/aws/deep-learning-containers/pull/1630/files

However when setting up the endpoint it gives this error
OSError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

Not sure what to do, but should help debug the issue with latest 1.10 HuggingFace container.

1 Like

Hey @mansimov,

OSError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

yeah thats an issue we are also still seeing, that’s why the 1.10 containers haven’t been released.

ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

This sounds like a good old AWS Lambda issue, meaning in AWS Lambda you have ~500MB of temp space to store files. My assumption is that detectron2 and torchvision might be to big for it.
But those are the things we are going to add soon when adding support for the other modalities.

Thanks for clarifications @philschmid .

I digged more into the logs when installing torchvision. Turned out that I was installing torchvision 0.10.0 which is incompatible with torch 1.9.1. This triggered the whole uninstall of torch 1.9.1 in the container and reinstall of torch 1.9.0 with all its dependencies which perhaps took more than 500 MB to temporary store. Simply changing torchvision to version 0.10.1 which is compatible with torch 1.9.1 resolved my memory issue.

After that endpoint with torchvision and detectron2 was created successfully. When invoking it I saw the undefined symbol issue in detectron which according to authors of the library means that the installed pre-built detectron2 wheel is incompatible with torch and torchvision. And it makes sense since the detectron wheel I am using was built for torch 1.9.0 and torchvision 0.10.0 not the 1.9.1 and 0.10.1 versions. Ehhh the version incompatibility again :smiley: :sweat_smile:

As my last ditch attempt I will create the detectron2 wheel locally on ubuntu 20.04 machine with torch versions 1.9.1 and torchvision version 0.10.1 and use that wheel in the requirements.txt . Let’s see if it works!

Also will be waiting for 1.10 container support. If you set torch to version 1.10.0 installing detectron2 would be very easy in server less according to above lesson!

1 Like

Update:

I managed to create the Serverless endpoint successfully!

Using the custom wheel of detectron2 for PyTorch 1.9.1 and torch vision 1.9.1 worked. I will keep the link for others here in case they need it for their projects http://mansimov.io/files/detectron2-0.6-cp38-cp38-linux_x86_64.whl

Will likely create a small GitHub repo for others to reproduce the LayoutLMv2 (and related models) on AWS sagemaker serverless in future

Thanks @philschmid and @marshmellow77 for help!

4 Likes

Hey @mansimov,

That’s great to hear!!

I hope we have better support for vision soon to make it easier for other users.

1 Like

I am trying to do the same with LayoutXLM. did you end up making a github repo?