Thank you for the guidance, tried the directions in the link with the same result.
I am doing a git checkout of the specified model, distilber-base-uncased in this case, creating a model.tar.gx and loading that into s3 as the target model (for this use case, I am skipping training)
Model, endpoint config and endpoint code blocks below:
huggingface_model = HuggingFaceModel(
model_data=f"s3://{s3_bucket}/{s3_prefix}/model.tar.gz",
role=role,
transformers_version="4.12.3",
pytorch_version="1.9.1",
py_version='py38',
env={
'HF_TASK': 'text-classification'
},
)
huggingface_model_config = client.create_model(
ModelName = "nlp-serverless-model-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime()),
ExecutionRoleArn = role,
Containers = [
{
'Image': '763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04',
'Mode': 'SingleModel',
'ModelDataUrl': model_file,
'Environment': {"MMS_DEFAULT_WORKERS_PER_MODEL": '1'}
}
],
)
endpoint_config_response = client.create_endpoint_config(
EndpointConfigName=epc_name,
ProductionVariants=[
{
'VariantName': 'single-variant',
'ModelName': sm_model_name.split("/")[1],
'ServerlessConfig': {
'MemorySizeInMB': 6144,
'MaxConcurrency': 10,
},
},
],
)
create_endpoint_response = client.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=epc_name,
)