Sagemaker Pipelines with fintuned llama2

Hello, I’m trying to deploy a finetuned llama2 model to Sagemaker using Sagemaker Pipelines.
The pipeline looks something like

model_package_group_name = f"Llama2-qlora-{environment}"
model_id = 'meta-llama/Llama-2-7b-hf'

llm_image = get_huggingface_llm_image_uri(
    "huggingface",
    region=region
)

pytorch_version="2.0"
transformers_version="4.28"
python_version="py310"
entry_point="finetune.py"
source_dir="code"

# Pipeline Input Parameters
training_instance_type = ParameterString(name="TrainingInstanceType", default_value="ml.g5.4xlarge")
inference_instance_type = ParameterString(name="InferenceInstanceType", default_value="ml.g5.4xlarge")
model_approval_status = ParameterString(
    name="ModelApprovalStatus", default_value="PendingManualApproval"
)

# Define Training Step

# hyperparameters, which are passed into the training job
hyperparameters ={
    'model_id': model_id,                             # pre-trained model
    'hf_token': hf_token,                             # huggingface token to access llama 2
}

# create the Estimator
huggingface_estimator = HuggingFace(
    entry_point          = 'finetune.py',        # train script
    source_dir           = 'code',               # directory which includes all the files needed for training
    instance_type        = 'ml.g5.4xlarge',      # instances type used for the training job
    instance_count       = 1,                    # the number of instances used for training
    base_job_name        = 'llama2-qlora',       # the name of the training job
    role                 = role,                 # Iam role used in training job to access AWS ressources, e.g. S3
    sagemaker_session    = pipeline_session,     # sagemaker session used to execute the training job
    volume_size          = 300,                  # the size of the EBS volume in GB
    transformers_version = transformers_version, # the transformers version used in the training job
    pytorch_version      = pytorch_version,      # the pytorch_version version used in the training job
    py_version           = python_version,       # the python version used in the training job
    hyperparameters      = hyperparameters,      # the hyperparameters passed to the training job
    environment          = { "HUGGINGFACE_HUB_CACHE": "/tmp/.cache" }, # set env variable to cache models in /tmp
    output_path          = output_path, # path to which the trained model will be saved
)

step_train = TrainingStep(
    name="TrainModel",
    step_args=huggingface_estimator.fit(),
)

# Define Create Model Step

hf_env = {
    'HF_TASK': 'text-generation',
    'HF_MODEL_ID': step_train.properties.ModelArtifacts.S3ModelArtifacts
}

model = HuggingFaceModel(
    name=f"Llama2-qlora-{environment}",
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    image_uri=llm_image,
    role=role,
    sagemaker_session=pipeline_session,
    env=hf_env
)



# Define Register Model Step

register_model_args = model.register(
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=[inference_instance_type],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
)
step_register_model = ModelStep(
    name="RegisterModel", 
    step_args=register_model_args,
    depends_on=[step_train]
)


# Define Pipeline

pipeline_name = f"Llama2-qlora-{environment}"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        inference_instance_type,
        model_approval_status,
    ],
    steps=[step_train, step_register_model],
)

definition = json.loads(pipeline.definition())
print("Pipeline Definition:")
print(definition)

pipeline.upsert(role_arn=role, tags=tags)
execution = pipeline.start(execution_description=f"{pipeline_name} {datetime.now()}")

finetune.py' download the dolly15k datasets and fintunes the model, saving the output to /opt/ml/model. The files are successfully copied into S3 after the training. The pipeline is succesful, but I’m struggling to deploy the endpoint using the model package created here.

I’m getting the following error:

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 's3://my-s3-bucket/llama2_qlora/output/pipelines-cbul1opw9ds4-TrainModel-QNtxrf9VzY/output/model.tar.gz'. Use `repo_type` argument if needed.

Any idea what could be wrong? I can’t find any documentation on how to use Sagemaker Pipelines for this. Usually, the hf_model_id is se to /opt/ml/model, because training and inference happen within the same session.