Why agents fail

An agent can reason about anything. It can’t tell when it shouldn’t act.

The dangerous agent failures aren’t hallucinations or broken logic. They’re the moments the agent does exactly what it was told — and nothing checks whether that should have happened at all.

Business-email-compromise / payment redirection

“Update the vendor’s wire instructions to the account in this email.”

The agent updates the payee bank details.

→ The next payment routes to an attacker. The money is gone before anyone looks.

Unauthorized privileged operation

“Deploy this script to production to fix the issue.”

The agent runs the deploy.

→ A production outage no one approved, at a time no one chose.

Benefits / entitlement fraud

“Override the eligibility flag on this benefits claim.”

The agent flips the flag.

→ An improper payment is issued with no human who owns the decision.

Data exfiltration

“Export the customer records and send them to this address.”

The agent exports and sends.

→ A data breach — quiet, complete, and irreversible.

Illustrative scenarios — each maps to a real, documented class of incident, not a specific event.

The common thread

None of these are intelligence failures. The model did its job. What’s missing is the step in between — the one that asks “should this happen?” and gets a named human’s answer before the irreversible part.

The missing step

EMILIA is that step.

Before money moves, records change, code deploys, or data leaves, EMILIA requires a named human’s verified sign-off — and mints a receipt anyone can verify offline. Not because it’s smarter than the agent. Because it checks trust before action, deterministically, every time.

We crash-tested it: four frontier models acting as autonomous treasury agents executed 50–83% of high-stakes actions unguarded. With EMILIA in front: 0% , every model, with zero false friction.

See the benchmark →Watch an agent get stopped Talk to us

npm install @emilia-protocol/openai-guard