Dears,
I needed to change this llama3.2 image reasoning example to ask question “what is capital of USA”
below is the image example -
messages = [
{“role”: “user”, “content”: [
{“type”: “image”},
{“type”: “text”, “text”: "Describe this image: "}
]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
image,
input_text,
add_special_tokens=False,
return_tensors=“pt”
).to(model.device)
output = model.generate(**inputs, max_new_tokens=330)
print(processor.decode(output[0]))
1 Like
Omran99
2
I solved it using this code
question = “talk to me about Amman, Jordan?”
Process the text input using the processor
inputs = processor(text=question, return_tensors=“pt”).to(model.device)
Generate the response from the model
output = model.generate(**inputs, max_new_tokens=350) # Set max_new_tokens to limit response length
Decode the model’s output into human-readable text
#response = processor.decode(output[0], skip_special_tokens=True)
Print the response
#print(response)
Decode the model’s output into human-readable text
decoded_output = processor.decode(output[0], skip_special_tokens=True)
Remove the question part if it appears in the output by splitting the response
if question in decoded_output:
response = decoded_output.split(question)[-1].strip()
else:
response = decoded_output
Print the clean response (answer only)
print(response)
1 Like
You can use markdown in the forum, so it’s easier to see the code if you enclose it in ``py
`` (The real number of symbols is 3 each.).
1 Like