QLoRA trained LLaMA2 13B deployment error on Sagemaker using text generation inference image

rycfung · August 5, 2023, 10:29pm

@elodium I ended up building the 0.9.3 image from scratch (which was half a day of work between actual compiling building and figuring out some build configurations to make the build stop freezing/blowing all of the memory even with 100GB of RAM on EC2).

I ended up deploying with the TGI 0.9.3 image I built on a g5.4xlarge, and it worked. Only issue was even though I deployed the 4-bit QLoRA LLaMA 2 13B, generation was pretty slow, and often freeze on the sagemaker deployment or times out after 30s. That was super odd because I was expecting the 4bit quantized 13B to breeze thru the generation on a 4xlarge.

Going to try the official 093 image to see if there’s a difference.

Topic		Replies	Views
Error loading finetuned llama2 model while running inference Amazon SageMaker	27	4836	September 20, 2023
Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error Amazon SageMaker	29	6899	January 8, 2024
QLoRA trained Mixtral 8x7B deployment error on Sagemaker using text generation inference image Amazon SageMaker	0	311	April 10, 2024
Deployment issue on Sagemaker Amazon SageMaker	16	3371	October 4, 2023
Text-generation-inference: "You are using a model of type llama to instantiate a model of type ." Models	5	7678	November 3, 2023

QLoRA trained LLaMA2 13B deployment error on Sagemaker using text generation inference image

Related topics