Hi everyone,
I am experimenting with recently released Sagemaker Serverless inference thanks to Julien Simon’s tutorial
Following it I managed to train a custom DistillBERT model locally, upload to S3 and create a Serverless checkpoint that works.
Right now I am pushing it further by trying it with LayoutLMv2 model.
However, it is not clear to me how to pass inputs to it. For example in DistillBert I just create input like this test_data_16 = {'inputs': 'Amazing!'}
and pass it as JSON in invoke_endpoint
function.
In LayoutLMv2 input consists of three parts: image, text and bounding boxes. What keys do I use to pass them ? Here is the link to the call of the processor
Second question is: It is not clear to me how to make modifications to the default settings of processor when creating the endpoint. For example, I would like to set the flag only_label_first_subword
True by default in the processor. How to do that?
Thanks!
1 Like
Hi Elman, thanks for opening this thread, this is a super interesting topic
No matter what model you deploy to a Sagemaker (SM) endpoint, the input always requires preprocessing before it can be passed to the model. The reason you can just pass some text in the case of DistilBert without having to do the processing yourself is that the SageMaker Hugging Face Inference Toolkit does all the work for you. This toolkit on builds on top of the Pipeline API, which is what makes it so easy to call.
What does that mean for you when you want to use a LayoutLMV2 model? I see two possibilities:
-
The Pipeline API offers a class for Object Detection: Pipelines. I’m not familiar with it but I would imagine that it is quite straightforward to use. Again, because the Inference Toolkit is based on Pipelines, once you figure out how to use the Pipeline API for Object Detection you can use the same call for the SM Endpoint
-
The Inference Toolkit also allows you to provide your own preprocessing script, see more details here: Deploy models to Amazon SageMaker. That means you can process the inputs yourself before passing it to the model. What I would do (because I’m lazy) is to just look at an already existing demo to see how the preprocessing for a LayoutLMV2 model works. For example this one: app.py · nielsr/LayoutLMv2-FUNSD at main, and use that.
Hope this helps, please let me know how it goes and/or reach out if any questions.
Cheers
Heiko
2 Likes
One caveat I forgot to mention: At the moment it seems that deploying a model >512MB to a serverless endpoint can lead to an error. Fortunately there seems to be a workaround: Sagemaker Serverless Inference - #7 by philschmid
Just something to be aware of!
Cheers
Heiko
2 Likes
Thanks a lot Heiko,
Let me look into these two possibilities deeper and get back to you.
Regarding deploying model >512MB by setting MMS_DEFAULT_WORKERS_PER_MODEL=1 do you have rough numbers on how it affects the latency?
Thanks again!
Heiko @marshmellow77 upon closer inspection using Inference Toolkit suits me better.
However the documentation is very scarce and there are no examples I found implementing all these functions like model_fn
and etc for custom model. Do you have link to examples?
And is there a way to locally debug them to avoid repeatedly creating endpoints for debugging?
Thanks again
Re MMS_DEFAULT_WORKERS_PER_MODEL=1
and latency, unfortunately I don’t have any data on that.
I have an example for an inference.py
script here: text-summarisation-project/inference.py at main · marshmellow77/text-summarisation-project · GitHub, I hope this is useful.
Now, re debugging - agreed that it would be ideal to locally debug the deployed inference code. I know that Sagemaker offers “local mode”, which (in theory) should allow you to do exactly that: Use the Amazon SageMaker local mode to train on your notebook instance | AWS Machine Learning Blog
So far the pain wasn’t high enough to try it myself, I always got by with print statements in the inference code so far. But in case you want to give the local mode a try I’d be curious to learn whether it works well for you
Hey @mansimov MMS_DEFAULT_WORKERS_PER_MODEL=1
should not decrease performance at all. That config is basically limiting the number of HTTP Workers, which can be started and since AWS Lambda can handle 1 requests at a time anyways its not a problem.
1 Like
Hi @philschmid and @marshmellow77
Thanks a lot for information comments!
I gave a first try on Serverless LayoutLMv2. On the first try I managed to create an endpoint successfully, however after invoking it I saw an error that detectron2 which LayoutLMv2 depends on was not found.
So I tried re-creating the checkpoint by adding git+https://github.com/facebookresearch/detectron2.git
in the requirements. However this time the endpoint creation failed due to timeout.
Failure reason
Unable to successfully stand up your model within the allotted 180 second timeout.
Please ensure that downloading your model artifacts, starting your model container and passing the ping health checks can be completed within 180 seconds.
I looked at the logs and the latest messages I saw were regarding the detectron2
installation. It is very likely that the endpoint creation is timing out due to additional time taking to install it, but I could be wrong.
I digged a bit into documentation to see if I can increase the delay. I added the WaiterConfig flags to the waiter wait
waiter.wait(EndpointName=huggingface_endpoint_name,
WaiterConfig={
'Delay': 480,
'MaxAttempts': 120
})
However still getting the same error.
I am not sure if I am setting up these delay flags correctly, so I would appreciate your help in pointing out how to potentially resolve this timeout issue. Thanks!
Hello @mansimov,
Are there more logs available in cloudwatch? The error message you see with 180 seconds means that sagemaker couldn’t create and test your endpoint in 180 seconds, which is the max time for SageMaker.
Which detectron2 version are you installing?
Hi @philschmid
I tried playing more with version of detectron2 I was using in the requirements.txt . Turned out changing git+https://github.com/facebookresearch/detectron2.git
to https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.9/detectron2-0.6%2Bcpu-cp38-cp38-linux_x86_64.whl
did the trick and endpoint was created successfully. Perhaps building detectron2
from scratch took too much time and using pre-build wheel library helped.
Now after invoking updated checkpoint I saw a message of new dependency missing
"message": "Failed to import transformers.models.layoutlmv2.modeling_layoutlmv2 because of the following error (look up to see its traceback):\nNo module named \u0027torchvision\u0027"
I was a bit surprised since I did not see torchvision being used in layoutlmv2 but anyway added torchvision==0.10.0
to the requirements.txt
Now the endpoint creation failed again now due to space not being available during torchvision
install according to the CloudWatch
logs
ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device
I was using the 6144 MB memory size fyi.
Considering it is quite a hassle and takes quite some time installing all these dependencies every time endpoint is created, I wonder if it is just easier to create a new inference container extending existing Huggingface Inference container with additional dependencies & upgraded packages.
Can you point out how to do so?
Thanks!
@philschmid I managed to install the personal HuggingFace inference container using your latest pytorch 1.10 commit https://github.com/aws/deep-learning-containers/pull/1630/files
However when setting up the endpoint it gives this error
OSError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory
Not sure what to do, but should help debug the issue with latest 1.10 HuggingFace container.
1 Like
Hey @mansimov,
OSError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory
yeah thats an issue we are also still seeing, that’s why the 1.10 containers haven’t been released.
ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device
This sounds like a good old AWS Lambda issue, meaning in AWS Lambda you have ~500MB of temp space to store files. My assumption is that detectron2 and torchvision might be to big for it.
But those are the things we are going to add soon when adding support for the other modalities.
Thanks for clarifications @philschmid .
I digged more into the logs when installing torchvision. Turned out that I was installing torchvision 0.10.0 which is incompatible with torch 1.9.1. This triggered the whole uninstall of torch 1.9.1 in the container and reinstall of torch 1.9.0 with all its dependencies which perhaps took more than 500 MB to temporary store. Simply changing torchvision to version 0.10.1 which is compatible with torch 1.9.1 resolved my memory issue.
After that endpoint with torchvision and detectron2 was created successfully. When invoking it I saw the undefined symbol issue in detectron
which according to authors of the library means that the installed pre-built detectron2 wheel is incompatible with torch and torchvision. And it makes sense since the detectron wheel I am using was built for torch 1.9.0 and torchvision 0.10.0 not the 1.9.1 and 0.10.1 versions. Ehhh the version incompatibility again
As my last ditch attempt I will create the detectron2 wheel locally on ubuntu 20.04 machine with torch versions 1.9.1 and torchvision version 0.10.1 and use that wheel in the requirements.txt . Let’s see if it works!
Also will be waiting for 1.10 container support. If you set torch to version 1.10.0 installing detectron2 would be very easy in server less according to above lesson!
1 Like
Update:
I managed to create the Serverless endpoint successfully!
Using the custom wheel of detectron2 for PyTorch 1.9.1 and torch vision 1.9.1 worked. I will keep the link for others here in case they need it for their projects http://mansimov.io/files/detectron2-0.6-cp38-cp38-linux_x86_64.whl
Will likely create a small GitHub repo for others to reproduce the LayoutLMv2 (and related models) on AWS sagemaker serverless in future
Thanks @philschmid and @marshmellow77 for help!
4 Likes
Hey @mansimov,
That’s great to hear!!
I hope we have better support for vision soon to make it easier for other users.
1 Like
I am trying to do the same with LayoutXLM. did you end up making a github repo?
Hello @mansimov ,
Please, share your solution to create input data (images, boxes, tkon).
Thanks in advance