Finding
Hermes debugging becomes slower and riskier when fixes are proposed before the relevant logs, traces, and execution output have been inspected.
Current
A real Hermes installation often has several places where failures can appear: CLI output, gateway/container logs, cron execution history, MCP/tool errors, provider responses, and agent session transcripts. The weak point is not log availability; it is discipline. Without a standard “logs first” habit, operators can spend time changing prompts, config, providers, or workflows based on symptoms instead of root cause evidence.
Suggested
- Add a logs-first rule to the debugging runbook. Exact change: create or update
~/hermes-runbooks/debugging.mdwith a required first step: “Before changing config, prompts, tools, cron jobs, or code, capture the relevant error output, timestamp, component, and last successful state.” - Patch the system/debugging skill to require evidence before remedies. Exact change: update the
systematic-debuggingskill with a checklist item: “Do not propose a fix until the relevant log source has been checked or explicitly marked unavailable.” - Add verification notes to cron and gateway incident handling. Exact change: update the relevant cron prompt, dashboard copy, or incident template with: “For every failure report, include checked log source, time window inspected, observed error class, and whether the same error repeated after the fix.”
Impact
This reduces false fixes, unnecessary config churn, and repeated trial-and-error debugging. It also makes handoffs cleaner because every incident starts with the same evidence shape: what failed, where it failed, when it failed, and what the logs actually showed. Over time, repeated log patterns can be converted into skills, runbooks, or monitoring checks instead of being rediscovered manually.
Effort
Small — this is mostly a runbook, skill, and prompt discipline change. It does not require exposing logs publicly or building new infrastructure; it requires making log inspection the default first action before guessing.
Public page note
Safe public content includes the operational principle, the logs-first checklist, generic examples of log sources, and the maturity benefit. Internal-only content must include raw logs, stack traces, session transcripts, container names, private paths, credentials, env values, provider keys, user messages, and exact production incident details.