I’m new to using chatml, but I have successfully generated the response that I want.
However, I am struggling in my attempts to limit the response to the generated content.
Example:
You are helpful robot.
<|im_end|>
<|im_start|>user
Mary had a little ...
<|im_end|>
<|im_start|>assistant
… will produce a response that looks something like:
You are helpful robot.
<|im_end|>
<|im_start|>user
Mary had a little ...
<|im_end|>
<|im_start|>assistant
Mary had a little lamb.
What I am hoping to do is for the response to be limited to only the generated output, without any of the input e.g.,:
lamb.
I can of course handle this after I get the response back, but for a variety of reasons I’d like to limit the response if possible (mostly because the input context is massive). I’ve found a few suggestions online but nothing seems to be working.
Also, I haven’t been able to find any definitive overview or documentation regarding ChatML… so maybe this is already clarified in docs that I’m just not aware of?