QLoRA trained LLaMA2 13B deployment error on Sagemaker using text generation inference image

I haven’t been successful with the new llm hosting version “0.8.2” in Sagemaker. However, I managed to deploy the output trained model using custom inference code (def model_fn(model_dir) and predict_fn(data, model_and_tokenizer).

I opened a similar issues here:

If you find any solutions please let me know.