Multi-lora serving with adapters on S3

I’ve read this great article and want to deploy a base model (Qwen2.5 7B) and use it with different adapters. My model is fine-tuned via SageMaker and saved in S3.

I created the model like this:

config = {
  'HF_MODEL_ID': "/opt/ml/model", 
  'LORA_ADAPTERS': 'adapter_id=adapters/my_adapter_1'
}

llm_model = HuggingFaceModel(
  role=role,
  model_data={'S3DataSource':{'S3Uri': "s3://.../qwen-25-7-all-linear-.../output/model/",'S3DataType': 'S3Prefix','CompressionType': 'None'}},
  image_uri=llm_image,
  env=config
)

The structure of my S3 bucket is like this:

model/
--- all model files
--- adapters/
------ my_adapter_1/
--------- adapter_config.json
---------adapter_model.safetensors

When I try to deploy the endpoint, I get this error:

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2024-11-12-20-51-47-645: Failed. Reason: error: Key of model data S3 object 's3://sagemaker-ca-central-1-.../qwen-25-7-all-linear-.../output/model/adapters/' maps to invalid local file path..

Any guidance on how to properly do this and organize the files properly?

thanks!

1 Like