Serveless memory problem when deploy Wav2Vec2 with custom inference code

Are you providing the model via s3 uri or hub configuration. You have to provide it via s3://path as model_data since the hub configuration is not loading the kenlm model.

I’m providing the model via s3

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serializers import DataSerializer

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	model_data='s3://sagemaker-us-east-2-094463604469/model.tar.gz',
	role=role, 
)

You might also need to provide a requirements.txt to install transformers==4.19.2 not sure if the LM-boosted decoding was available in 4.17

Even after add transformers==4.19.2 as dependency in requirements.txt i got the same problem. I’m using transformers 4.17 in my local machine and works fine, only when deployed in sagemaker i found problems.