Deploying TheBloke/Luna-AI-Llama2-Uncensored-GGML

I’m trying to deploy the following huggingface model to AWS SageMaker:

I created a domain, launched the studio, and opened a new notebook:

Image: Data Science 3.0
Kernel: Python 3

I tried running the following code:

import json

import sagemaker

import boto3

from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:

role = sagemaker.get_execution_role()

except ValueError:

iam = boto3.client('iam')

role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models

hub = {

'HF_MODEL_ID':'TheBloke/Luna-AI-Llama2-Uncensored-GGML',

'SM_NUM_GPUS': json.dumps(1)

}

# create Hugging Face Model Class

huggingface_model = HuggingFaceModel(

image_uri=get_huggingface_llm_image_uri("huggingface",version="0.9.3"),

env=hub,

role=role,

)

# deploy model to SageMaker Inference

predictor = huggingface_model.deploy(

initial_instance_count=1,

instance_type="ml.g5.2xlarge",

container_startup_health_check_timeout=300,

)

# send request

predictor.predict({

"inputs": "My name is Clara and I am",

})

I’m getting the following errors and warning:

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-09-10-11-59-20-948: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint…

ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
distributed 2022.7.0 requires tornado<6.2,>=6.0.3, but you have tornado 6.3.2 which is incompatible.

WARNING: Running pip as the ‘root’ user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

I checked the CloudWatch logs following the instructions in first error, and I found many DownloadError logs for different files. For example:

Error: DownloadError File “/opt/conda/bin/text-generation-server”, line 8, in sys.exit(app()) File “/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py”, line 182, in download_weights utils.convert_files(local_pt_files, local_st_files, discard_names) File “/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”, line 106, in convert_files convert_file(pt_file, sf_file, discard_names) File “/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py”, line 65, in convert_file loaded = torch.load(pt_file, map_location=“cpu”) File “/opt/conda/lib/python3.9/site-packages/torch/serialization.py”, line 815, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File “/opt/conda/lib/python3.9/site-packages/torch/serialization.py”, line 1033, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args)

2023-09-12T02:08:45.377+08:00 _pickle.UnpicklingError: could not find MARK