Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error

philschmid · July 18, 2023, 8:51am

Hello,

We updated the script and requirements to merge and save the weights in the safetensors format when training. Meaning there is no conversion needed on the LLM inference container side.

You can update the script and requirements, and it should work when deploying after training.

Topic		Replies	Views
QLoRA trained LLaMA2 13B deployment error on Sagemaker using text generation inference image Amazon SageMaker	14	2996	August 18, 2023
Unable to deploy Falcon 40b OASST1 model into SageMaker TGI container Amazon SageMaker	0	435	July 29, 2023
Falcon 40B instruct training with QLora, Sagemaker model artifact location Amazon SageMaker	3	403	September 21, 2023
Error loading finetuned llama2 model while running inference Amazon SageMaker	27	4823	September 20, 2023
Inference failed for FLAN-UL2(20B) on SageMaker Amazon SageMaker	6	2187	April 4, 2023

Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error

Related topics