Model Sommelier

Finding

Model routing is weak when Hermes chooses models by habit instead of matching provider, cost, latency, reliability, and reasoning depth to the actual workflow type.

Current

A real Hermes installation often accumulates several available models across primary, fallback, research, coding, cheap, and high-context use cases. The weak point is not having too few models; it is operating without a clear routing policy. Without a documented model-selection rule, easy tasks may waste premium tokens, hard tasks may be sent to weak models, fallbacks may be treated as equivalent, and cron or subagent workloads may quietly inherit a default model that is not cost-effective for their purpose.

Suggested

Create a model routing matrix for common Hermes work types. Exact change: add docs/runbooks/model-routing.md with rows for quick chat, Hermes config review, coding/debugging, long research, cron monitoring, public-page drafting, and multi-agent analysis; for each row name the preferred model tier, acceptable fallback tier, and “do not use” cases.
Add a model-choice preflight to recurring and delegated work. Exact change: patch the Optimizer Agent cron prompt and delegation runbook with: “Before running recurring or subagent work, choose the smallest model that can satisfy the task, and only escalate when the task needs stronger reasoning, longer context, or higher reliability.”
Add a lightweight verification habit after model changes. Exact change: create a dashboard or runbook checklist item named “Model Sommelier check” that records whether the selected model produced acceptable quality, latency, and cost for the workflow type, then updates the model routing matrix when a pattern repeats.

Impact

This improves the quality/cost balance of Hermes without reducing capability. Strong models remain available for tasks that need deep reasoning, while routine monitoring, drafting, and extraction work can use cheaper or faster routes. It also makes fallback behavior safer because alternatives are treated as role-specific choices, not interchangeable names in a config file.

Effort

Small — the main work is one routing runbook, one prompt habit for cron/delegation, and a short review loop after repeated model use. No new infrastructure is required unless the installation decides to add or remove providers later.

Public page note

Safe public content includes the operating principle, generic model-routing categories, cost/quality tradeoff language, and examples of verification habits. Internal-only content includes real provider keys, exact billing data, private benchmark logs, raw task transcripts, sensitive prompts, customer workloads, and live model configuration values.