Training "don't know" and "don't understand" responses

I’ve seen that some language models have pretty good “I don’t know” or “I don’t understand” responses when given a prompt it presumably doesn’t have good training for. I haven’t been able to find any explanation of how to do this. Any suggestions?