Deploy this TheBloke/vicuna-13B-v1.5-GGUF model on AWS
I want to use this model as an endpoint in my web application in this format:
Chatbot Requirements
- Scope: Chatbot (Encoder/Decoder for Text Inference or Conversational)
- Input via API (JSON): Chatgpt Style – The template can be see below
The JSON will contain 25 user messages, and the response should be the system response.
Please use this guidelines to understand API consumption: InvokeEndpoint - Amazon SageMaker
-
Prompt Template for the system:
a. template = ‘’’
You are going to be my education assistant.
System:{System}
Question:{question}‘’’ -
LLM Model Parameters: max_new_tokens=512, temperature=0.7, top_p=0.9
-
If possible use a AutomodelforCausalLM otherwise train a LLM model.
-
It will be deployed on AWS Sagemaker using S3 buckets.
-
The GGUF should be saved on a S3 Bucket.
-
Chat Buffer should store 25 conversations and create a session ID (No need to send this to the End point).
-
The quantized model is contained here vicuna-13b-v1.5.Q4_K_M.gguf · TheBloke/vicuna-13B-v1.5-GGUF at main
-
Use HuggingFace/Langchain when possible.
-
Deliverables: Jupyter notebook/Code – 2 Hours should be used to set up the model in AWS with the customer.
Provide me with complete source code that I can use in my jupyter notebook on aws to make an endpoint.
I need it asap.