Error loading finetuned llama2 model while running inference

To anyone else facing this problem, it works totally fine on a plain old EC2 instance with TGI v1.0.0. Which would be because text generation interface added support for Llama2 in v0.9.3 while sagemaker python sdk only recognises upto 0.8.2. I used g4dn.12xlarge instance.

1 Like