How to force the assistant to write some tokens mid-generation?

Here’s an example:

User: Hello make a python function for something
Assistant: Here’s an function for that:

def function():
pass

<codetests> ← This is a line we tuned the model to generate
import pytest
assert foo == bar
</codetests> ← Execute the tests right after this token was predicted
Result: tests succeeded ← THIS is the forced tokens
Ok, looks like the function is working…

EDIT:

The LLM is trained to respond with the same block given above, however since LLMs are bad at detecting when they have done a mistake they will lean towards saying succeeded for everything.
However after the inference pass for the token ā€œsucceededā€ there will be a probablity distribution e.g.

succeeded 0.5
failed 0.3
etc.

So I want to ā€œforceā€ the model to pick failed (or succeeded) even though it is a less likely token. Seems like something very simple, but would probably mean hacking into transformers?

1 Like