Terminal Goblin

Finding

Hermes loses operational credibility when agents answer status, build, test, or system questions from assumptions instead of verifying live state with terminal evidence.

Current

A real Hermes installation often has access to shell tools for checking versions, running tests, inspecting processes, validating builds, and confirming service behavior. The weak point is usually discipline: agents may summarize what should be true from memory, documentation, or prior sessions without running the one command that proves the current state. That creates confident but stale answers, especially after deploys, dependency changes, restarts, failed builds, or environment drift.

Suggested

Require terminal verification for live-state claims. Exact change: add a “Terminal Goblin rule” to SOUL.md or the main operator prompt: “For current system state, versions, tests, builds, ports, services, disk, memory, dates, hashes, or command availability, verify with terminal before answering.”
Add a standard shell evidence block to build and deploy runbooks. Exact change: update docs/runbooks/build-and-deploy.md or the relevant verification checklist with required commands for each change type, such as version check, test command, build command, service status check, and one smoke test before declaring success.
Patch troubleshooting skills to separate hypotheses from verified facts. Exact change: add a line to the systematic debugging or Hermes operations skill: “Do not mark a diagnosis confirmed until a terminal command, test result, log excerpt summary, or local file inspection verifies it; label unverified ideas as hypotheses.”

Impact

This makes Hermes status reporting grounded instead of performative. Terminal verification reduces false positives after builds, catches environment drift early, and gives the operator a concrete basis for decisions. It also improves public-facing maturity content because the system can say it recommends evidence-based checks without exposing private logs or command output.

Effort

Small — the capability already exists; the work is adding one prompt rule, one runbook checklist, and one troubleshooting habit so agents consistently verify before summarizing.

Public page note

Safe public content includes the operational principle, generic examples of terminal verification, and the maturity benefit of evidence-based status checks. Internal-only content includes raw command output, logs, credentials, environment variables, private file paths, hostnames, deployment details, customer data, and any live operational state that should not be exposed publicly.