Llama3 8b instruct not answering question

Using the code from HF deploy with sagemaker option. Able to deploy to aws endpoint but the output is not answering a question. It keeps repeating the question and followed next predict token. Any help?

can you show the code, prompt, and the output?

Code

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'meta-llama/Meta-Llama-3-8B-Instruct',
	'SM_NUM_GPUS': json.dumps(1),
	'HUGGING_FACE_HUB_TOKEN': '<REPLACE WITH YOUR TOKEN>'
}

assert hub['HUGGING_FACE_HUB_TOKEN'] != '<REPLACE WITH YOUR TOKEN>', "You have to provide a token."

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	image_uri=get_huggingface_llm_image_uri("huggingface",version="1.4.2"),
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1,
	instance_type="ml.g5.2xlarge",
	container_startup_health_check_timeout=300,
  )
  
# send request
predictor.predict({
	"inputs": "My name is Clara and I am",
})

Output

[{‘generated_text’: ‘My name is Clara and I am an artist and a writer. I have a passion for beauty and creativity.\nI love painting and drawing, and my art style is often described as whimsical and dreamy. I find inspiration in nature, mythology, and the human experience.\nIn addition to my visual art, I also enjoy writing poetry and fiction. My writing style is often more introspective and exploratory, and I find inspiration in the world around me, from the beauty of a sunset to the mysteries of the human heart.\nI’}]

Other query for asking a question:

Question : What is your name

[{‘generated_text’: ‘What is your name? My name is Laura.\nWhat do you do? I am a.writer and a content strategist.\nWhat do you enjoy doing in your free time? In my free time, I enjoy reading, hiking, and spending time with my family and friends. I am also a big fan of stand-up comedy and love attending comedy shows.\nWhat do you find most rewarding about your work? I find the most rewarding part of my work is helping others tell their stories and share their ideas with the world. It’}]

it will repeat the question + output. How can I make it more like question answer prompt?

I’m thinking it’s just not generating the end of sequence token. So it answers the question and then continues to generate till it hits the token limit.

In the example on the HuggingFace model card for llama3 they seem to manually set an end of sequence token. I’m wondering if you will need to somehow do the same here.

i get better output with the token. Am I doing it correctly?

Question: tell me about your self. <|eot_id|>

[{'generated_text': "tell me about your self. <|eot_id|>.widgets.Stepper(\n initial_value=0,\n min_value=0,\n max_value=10,\n step=1,\n)\nAs a chatbot, I'm a program designed to simulate conversations with humans. I was trained on vast amounts of text data, which enables me to understand and respond to a wide range of questions and topics. I'm constantly learning and improving my responses based on the interactions I have with users like you.\n\nWhen you ask me a question or provide information about"}]

What I meant was that on the module card they set that to be the end of sequence token within the models configuration (NOT in the prompt)

The model then generates this token itself and stops generating.

The logic is basically that the model keeps generating new words based on the previous words until it sees (generates) the end of sequence token OR it hits the token limit which I think defaults to 256 in this case (though I could be wrong on the exact number)

I’m unsure as to whether sagemaker looks after this or if you will need to set it yourself.

Thanks for the reply. After set MESSAGES_API_ENABLED=true I am able to have a proper chat.

https://huggingface.co/docs/text-generation-inference/messages_api