Databricks models deployment to sagemaker are not working

For the databricks models, none of the examples for deploying on Amazon SageMaker work. The suggested parameters for deploying as a Sagemaker endpoint are not working. I mean the endpoint deployment works but when trying to call the endpoint, it fails. The suggested parameters for instantiating the sagemaker. HuggingFaceModel seem off, and the instance type is deffinitely too weak.
After adjusting the suggested deployment configs to ‘transformers_version’: ‘4.26.0’, ‘pytorch_version’: ‘1.13.1’, and a more powerful endpoint type, now I am told that I need to set the option trust_remote_code=True when the pipeline is called from inside the docker that was deployed on the endpoint. How can I determine that parameter to be passed as true? If that is not doable, then none of the dolly models are usable as out of the box sagemaker deplyments

1 Like

Can you please share the code you used to deploy the model you are talking about? Hard to understand your issue without any details

the code is already there, it is suggested on the huggingface site. Go to any of the databricks models (ex databricks/dolly-v2-12b · Hugging Face) then in the upper right there is a Deploy button dropdown, click on that, select Amazon SageMaker (the only option actually), then choose pretty much anything (I tried: text generation on AWS). Try to run that code as is. Model will be deployed, but then it fails on predict:

cloudwatch logs:

com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File “/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py”, line 372, in getitem
com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise KeyError(key)
[INFO ] W-databricks__dolly-v2-3b-4-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - KeyError: ‘gpt_neox’

Then, fiddling with the suggested parameters seems to fix the gpt_neox keyError issue. These are the adjustments I made:

huggingface_model = HuggingFaceModel(
	transformers_version='4.26.0',
	pytorch_version='1.13.1',
	py_version='py39',
	env=hub,
	role=role,
)

But brings us to the following error (I also tried with a ml.g5.8xlarge instance type - same error):

com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File “/opt/conda/lib/python3.9/site-packages/transformers/pipelines/init.py”, line 704, in pipeline
com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ValueError: Loading this pipeline requires you to execute the code in the pipeline file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.

This has to do with the way the pipeline is called from within the sagemaker docker image. For some reason, it is considered a “custom” pipeline and one has to explicitly set trust_remote_code=True when calling it. How can I get the already built and published docker image to pass trust_remote_code=True when calling the internal prediction pipeline? Or maybe I am doing something else wrong and that is not at all necessary.

2 Likes

Anybody figure this one out?

As expected, it looks like it gets a little hacky. But here is a solution.

@mcapizzi the link you shared is the current best way. Since we are not having a way to tell the toolkit that you are okay with using “remote_code” let me add that to our backlog.

Thanks @philschmid ! I’m sure it’s impossible to accommodate every model and type so I’d understand if the subset of models that require “remote code” is so small that it’s not a priority. But I love the current functionality as it is! Got me up and running on some deployment tests in less than an hour.

1 Like