Need help in Deployment of TheBloke/vicuna-13B-v1.5-GGUF model on AWS

ahsanr · March 27, 2024, 1:27pm

Deploy this TheBloke/vicuna-13B-v1.5-GGUF model on AWS

I want to use this model as an endpoint in my web application in this format:

Chatbot Requirements

Scope: Chatbot (Encoder/Decoder for Text Inference or Conversational)
Input via API (JSON): Chatgpt Style – The template can be see below

The JSON will contain 25 user messages, and the response should be the system response.
Please use this guidelines to understand API consumption: InvokeEndpoint - Amazon SageMaker

Prompt Template for the system:
a. template = ‘’’
You are going to be my education assistant.
System:{System}
Question:{question}‘’’
LLM Model Parameters: max_new_tokens=512, temperature=0.7, top_p=0.9
If possible use a AutomodelforCausalLM otherwise train a LLM model.
It will be deployed on AWS Sagemaker using S3 buckets.
The GGUF should be saved on a S3 Bucket.
Chat Buffer should store 25 conversations and create a session ID (No need to send this to the End point).
The quantized model is contained here vicuna-13b-v1.5.Q4_K_M.gguf · TheBloke/vicuna-13B-v1.5-GGUF at main
Use HuggingFace/Langchain when possible.
Deliverables: Jupyter notebook/Code – 2 Hours should be used to set up the model in AWS with the customer.

Provide me with complete source code that I can use in my jupyter notebook on aws to make an endpoint.
I need it asap.

Topic		Replies	Views
Vicuan error on Sagemaker Amazon SageMaker	3	827	October 23, 2024
Deploying TheBloke/Luna-AI-Llama2-Uncensored-GGML Amazon SageMaker	0	842	September 11, 2023
Deployment issue on Sagemaker Amazon SageMaker	16	3297	October 4, 2023
GGUF BYOC Deployment with AWS SageMaker: [Errno28] No space left on device Amazon SageMaker	1	43	March 24, 2025
Deploying Mixtral8x7B on AWS Sagemaker from S3 Amazon SageMaker	2	480	June 11, 2024

Need help in Deployment of TheBloke/vicuna-13B-v1.5-GGUF model on AWS

Related topics