Grep n Guess

Grep n Guess — The Shift from "Should" to "Stop"

The Shift from "Should" to "Stop": What Inline Enforcement Actually Changes


Something Moved This Month

For most of AI governance's short history, the enforcement model has been advisory. Document the rules. Train the teams. Review the outputs. Flag violations after they happen. The tools supporting this model — risk registers, policy catalogs, assessment workflows — are mature and well-funded. They do what they're designed to do.

But over the past few weeks, a cluster of announcements signals a different architectural pattern emerging. Several vendors and frameworks have shipped capabilities that evaluate policy during execution rather than after it. The distinction matters more than it might appear.


What Shipped

Arcjet released AI Prompt Injection Protection, adding an inline decision point before AI inference. Rather than relying on the model itself to catch hostile inputs, Arcjet interposes at the application boundary — using identity, session, and routing context to block malicious prompts before they reach the model. The enforcement happens in the request lifecycle, not in the response review.

Apono launched Agent Privilege Guard, applying privilege controls at the moment an AI agent acts rather than relying on pre-deployment access reviews. The premise: agent permissions should be evaluated at execution time, scoped to the specific action, not inherited broadly from a static role assignment.

Bonfy described an AI-native data security architecture built around what they call "inline controls at the moment of generation" — a single policy plane governing both human and AI actors, with enforcement during content creation rather than after-the-fact detection.

Nemko outlined a three-layer defense architecture for AI in physical systems, arguing that runtime enforcement — evaluating each proposed action against policy and blocking or pausing execution — is the missing safety layer in most deployed agentic systems.

And the largest move: OneTrust introduced AI guardrail enforcement with real-time monitoring that continuously inspects models and agents, detects violations, and automatically blocks or limits risky behavior. For a platform historically rooted in consent management and privacy assessments, this represents a meaningful shift from registry-and-workflow governance to continuous runtime control.


What Actually Changes

The architectural difference between advisory and inline enforcement is worth understanding concretely, because the implications extend beyond security.

In an advisory model, governance operates on a review cycle. Policies are documented. Teams are trained. Outputs are sampled and audited. When violations are found, they're logged, escalated, and remediated. The time between violation and response might be hours, days, or — in practice — never, because the review burden exceeds the team's capacity.

In an inline model, governance operates in the execution path. Before an AI model processes a prompt, before an agent executes an action, before generated content leaves the system — a policy decision point evaluates whether the operation should proceed. The response is immediate: allow, deny, or constrain.

This changes three things simultaneously:

The blast radius shrinks. Advisory governance catches problems after they've occurred and potentially propagated. Inline enforcement catches them before execution. The difference between "we detected unauthorized data access in last week's audit" and "the request was blocked before it reached the data" is the difference between incident response and incident prevention.

The audit trail becomes structural. When every AI action passes through a policy decision point, the decision log is generated automatically — not by a human reviewing outputs after the fact. This produces evidence that is contemporaneous, comprehensive, and machine-generated, which is qualitatively different from periodic manual reviews for regulatory defensibility.

The governance load redistributes. Advisory models scale linearly with usage — more AI outputs means more review work for the same team. Inline enforcement scales with the policy engine, not the security team's headcount. The policy still needs to be written and maintained, but evaluation is automated at execution speed.


What It Doesn't Change

Inline enforcement isn't a replacement for the advisory layer. It's an addition. And it introduces its own problems worth naming.

First, latency. Inserting a policy evaluation into every AI request adds processing time. For real-time applications — customer-facing chatbots, coding assistants, operational decision support — the tolerance for added latency is measured in milliseconds. The policy engine must be fast enough to not degrade the experience, or adoption will route around it. If using the governed path is slower than the ungoverned path, people will find the ungoverned path.

Second, policy quality becomes the binding constraint. In an advisory model, a vague or incomplete policy produces a vague or incomplete audit finding — annoying but survivable. In an inline model, a vague policy produces false positives (blocking legitimate work) or false negatives (passing violations). The tolerance for imprecise policy drops dramatically when policy is evaluated at execution time rather than reviewed at leisure.

Third, the scope problem. Arcjet operates at the prompt boundary. Apono operates at the agent privilege layer. OneTrust operates at the model monitoring layer. Bonfy operates at the data-in-use layer. Each enforces inline, but at different points in the stack. An enterprise running multiple AI platforms, multiple agent frameworks, and multiple data environments needs enforcement consistency across all of them — which no single vendor currently provides.


Where This Goes

The shift from advisory to inline enforcement is structurally significant because it changes who — or what — is responsible for catching violations. In the advisory model, that's a human reviewer operating on a cycle. In the inline model, it's a policy engine operating at the speed of the AI system it governs.

The CNCF's recent Kyverno post illustrates one version of where this converges: a policy-as-code engine handling runtime enforcement while an AI-powered agent assists with context gathering and risk evaluation. The enforcement substrate is deterministic. The intelligence layer is assistive. Neither replaces the other.

Whether this pattern holds as AI governance matures is an open question. What's clear is that the industry is moving from governance as documentation to governance as infrastructure — from something you write about to something that runs. The vendors shipping this month are placing early bets on what that infrastructure looks like.

The interesting questions aren't about whether inline enforcement is better than advisory. They're about what happens when governance has to operate at the same speed as the systems it governs — and what has to be true about the policies themselves for that to work.


Grep 'n Guess is published every Wednesday on the NPM Tech blog. It explores the structural challenges of governing AI systems that pattern-match without formal grounding.

Grep n Guess AI Governance Runtime Enforcement Policy as Code
Mar 2026
The Market Map

The Market Map — Thirty Years of Governance Tools. Same Failure Rate. What Are We Missing?

8 mins
Mar 2026
State of the Industry

State of the Industry — The Governance Gold Rush

7 mins
Mar 2026
Natural Selection

Natural Selection — When Agents Holds the Keys: Two Weeks, Two Warnings

6 mins