← All updates
2026-05-07 Risk note Risk

OpenClaw 2026.5.6 operators need a wedged-Gateway recovery plan

The newest OpenClaw reports are converging on a practical reliability problem: once the Gateway is saturated or wedged, normal RPC-based restart paths and channel delivery can stop being useful. Operators are reporting 15-100s WebSocket responses, 99-100% event-loop utilization, zombie sessions, node.list errors that hang every agent session, native Codex runtime stalls after tool calls, and embedded direct-lane plugin tools disappearing from allowlists.

ImpactRisk Sources3 Audienceoperator · developer
Why it matters

A personal agent can tolerate many feature bugs, but not a failure mode where the control plane, channels, and recovery command all depend on the same wedged event loop. This is an operator playbook issue, not just a bug list.

Evidence
  • Issue #78861 reports OpenClaw v2026.5.6 with 15-100s WebSocket responses and ELU hitting 100% even after disabling Telegram and reducing concurrency
  • Issue #78908 reports a dashboard zombie session, 99-100% ELU, and model timeouts combining into 20+ minutes of no response
  • Issue #78915 proposes a watchdog-restart primitive because openclaw gateway restart depends on the Gateway RPC surface and cannot recover a fully wedged process
  • Issue #78881 reports node.list TypeErrors about undefined.trim every second and all agent sessions hanging after the errors begin
  • Issue #78870 reports native Codex runtime hanging after tool-call follow-up sampling with host CPU rising and temporary unresponsiveness
  • Issue #78865 argues for a tool-call circuit breaker after an agent retried a rate-limited operation for about 50 minutes
  • Issue #78907 plus PR #78914 cover embedded direct-lane plugin allowlists failing to materialize already-loaded tools
  • PR #78912 fixes an embedded-session streaming transport path for OpenAI-compatible providers, but the batch is not yet in a published OpenClaw release
Risk notes
  • Some reports are environment-specific, including Windows, macOS, Linux, Feishu, Discord, native Codex, and custom providers
  • Several issues are closed because PRs exist or design work started, not because a tagged release has reached operators
  • Do not assume one restart path is enough: if Gateway RPC is blocked, supervisor-level recovery may be the only practical option