How to force the assistant to write some tokens mid-generation?

Blazgo · April 23, 2025, 10:30pm

Here’s an example:

User: Hello make a python function for something
Assistant: Here’s an function for that:
def function():
pass
<codetests> ← This is a line we tuned the model to generate
import pytest
assert foo == bar
</codetests> ← Execute the tests right after this token was predicted
Result: tests succeeded ← THIS is the forced tokens
Ok, looks like the function is working…

EDIT:

The LLM is trained to respond with the same block given above, however since LLMs are bad at detecting when they have done a mistake they will lean towards saying succeeded for everything.
However after the inference pass for the token “succeeded” there will be a probablity distribution e.g.

succeeded 0.5
failed 0.3
etc.

So I want to “force” the model to pick failed (or succeeded) even though it is a less likely token. Seems like something very simple, but would probably mean hacking into transformers?

Topic		Replies	Views
Get probability of LLM outputting token sequence Beginners	1	3185	November 28, 2023
Unisloth 4-bit Llama models acting weirdly when used in a Function Beginners	0	165	May 8, 2024
Understanding Output of `PreTrainedModel.forward` Beginners	2	1899	February 12, 2024
Logits from generate and model call different 🤗Transformers	2	914	January 26, 2025
Can we force first token by model.config.forced_bos_token_id? 🤗Transformers	0	655	April 12, 2022

How to force the assistant to write some tokens mid-generation?

Related topics