Hermes approval hardening closes a critical YOLO-mode bypass and exposes long-session failure modes
The strongest Hermes item in the late May 14 window is approval safety, not another UI tweak. PR #23835 says `HERMES_YOLO_MODE` was read from `os.getenv()` on every approval check, so a skill or prompt-injected in-process tool could mutate `os.environ` and disable command approval checks after startup. The same PR tightens LLM smart-approval parsing from substring matching to exact `APPROVE`, logs dangerous background auto-approvals that previously had no audit trail, and expands pipe-to-shell detection to catch `/bin/bash` and `bash -c` variants. Nearby reliability work matters for the same operator audience: PR #25716 adds hierarchical long-context compression so huge transcripts can be summarized in bounded segments instead of timing out, while issue #25723 reports that one streaming provider error can disable streaming for an entire session rather than just the failing request.
Permission gates are only useful if untrusted in-process code cannot turn them off. The combination of approval parsing, audit logs, context compression, and streaming fallback controls is what makes long-running agent sessions inspectable after failures.
- PR #23835 labels the YOLO-mode environment re-read as critical because in-process code could set `HERMES_YOLO_MODE=true` after startup
- The same PR changes smart approval from substring matching to exact `APPROVE`, adds warning logs for non-interactive dangerous auto-approvals, and expands pipe-to-shell detection
- PR #25716 adds hierarchical map-reduce compression for very large transcripts and rehydrates persisted handoff summaries before recompression
- Issue #25723 reports streaming being disabled for the whole session after one provider streaming error
- Issue #25710 notes Telegram streaming can skip final MarkdownV2 formatting when raw text is unchanged
- The approval hardening PR was still open when reviewed, so production builds may not include the fixes yet
- YOLO/auto-approval modes remain high-risk even with better parsing and logging
- Compression and streaming fixes need real long-session/provider-failure tests, not only unit tests