How are you enforcing runtime policy and trust boundaries in agent systems today?

I’m seeing more agent projects move beyond model safety and toward runtime controls for actions, tool calls, identity, and isolation.

We’re hosting a free online discussion on this topic with Imran Siddique, maintainer of Microsoft’s open source Agent Governance Toolkit.

I’d love to hear how others here are thinking about agent runtime governance, especially in open source stacks.

Event link in case it is useful: Building Governed AI Agents: Microsoft Agent Governance Toolkit, Thu, May 7, 2026, 7:00 PM | Meetup

The thing that bit me hardest is the gap between SDK-level deny semantics and protocol-level enforcement. SDK deny lists for MCP tools turn out to be soft boundaries — the model still discovers the tool from the subprocess and invokes it anyway, even when the tool name is explicitly listed as disallowed. The only reliable approach was physical exclusion: the MCP server isn’t part of the agent’s option set at all, so the tool genuinely isn’t there.

The other thing worth governing for is hallucinated success — agent claims it called a tool, actually skipped it, output looks plausible. Most runtime governance I see focuses on “did you allow this action,” but the symmetric problem is “did the action you allowed actually happen.” Worth asking whether the Microsoft toolkit handles tool-call attestation, because that side of trust boundaries tends to get neglected.

I agree with the point about SDK level deny lists being too soft.

For agent systems, I think policy needs to be enforced at the runtime boundary, not inside the prompt and not only inside the model SDK.

The model should never be the authority on whether an action is allowed.

A practical setup is:

  1. Every tool has an explicit capability definition.

  2. Every agent or worker has a scoped permission set.

  3. The runtime checks each requested action before execution.

  4. The tool layer refuses calls that are outside policy, even if the model asks for them.

  5. Tool results are logged with request, caller, input, output, status, and trace id.

  6. Success claims are verified against actual tool execution records.

The second point matters a lot. It is not enough to ask, “Was this action allowed?”

You also need to ask, “Did the action actually happen?”

A model can claim it called a tool, skipped a step, or completed a task. The runtime should verify that against the execution trace.

So my rule is:

Prompts can guide behavior, but runtime policy must enforce behavior.

The trust boundary belongs around tools, state mutation, credentials, external actions, and persistent memory writes.