Finding
Open-weight model use stays immature when Hermes only treats local or open-weight models as fallback curiosities instead of testing them through a deliberate, privacy-safe operating lane.
Current
A real Hermes installation can route tasks across hosted frontier models, fallback providers, and local or open-weight options. The weak point is usually not access to models; it is the absence of a clear evaluation boundary for what open-weight models should handle. Without a repeatable test lane, operators cannot tell whether open-weight models are good enough for drafting, summarization, classification, tool-light research, privacy-sensitive preprocessing, or low-cost background jobs.
Suggested
- Define a safe open-weight task lane. Exact change: add an “Open-weight model boundary” section to
SOUL.mdor the model-routing runbook stating that open-weight models should first be tested only on non-sensitive prompts, synthetic examples, public documentation, and low-risk drafts before they are considered for private or operational workloads. - Add a lightweight model capability scorecard. Exact change: create
docs/runbooks/open-weight-model-scorecard.mdwith columns for task type, model name, prompt shape, latency, cost, answer quality, tool-use reliability, privacy suitability, and final routing decision. - Add a recurring review of open-weight opportunities. Exact change: patch the Optimizer Agent cron prompt with: “Review recent sessions for repeated low-risk tasks that could be routed to an open-weight model; recommend one candidate only when the task is public-safe, tool-light, and has a clear verification habit.”
Impact
This gives Hermes a practical path toward lower cost and greater provider independence without weakening reliability. Open-weight models become a measured operating option rather than a vague aspiration. The installation can keep frontier models for high-stakes reasoning while gradually moving safe, repeatable, or privacy-aware workloads into cheaper and more controllable routes.
Effort
Medium — the changes are mostly runbook and prompt updates, but useful adoption requires repeated comparison across real task types and a disciplined decision log.
Public page note
Safe public content includes the maturity principle, generic open-weight model routing criteria, non-sensitive evaluation patterns, and the recommendation to test with public or synthetic prompts first. Internal-only content includes private prompts, raw model transcripts, provider credentials, session metadata, routing configuration values, cost dashboards, local model endpoints, sensitive workload examples, and any operational logs.