Multi-Model Mage

Finding

A Hermes installation can use many model names across history, but the maturity gap is turning that variety into deliberate role-based routing and judged decisions instead of accidental model switching.

Current

A real Hermes setup often accumulates multiple models through provider experiments, fallbacks, eval runs, coding tasks, research tasks, and cheap-model discovery. That proves flexibility, but it can stay operationally weak if model choice is not tied to task type, risk level, cost, latency, and review role. Without a lightweight model-routing rule, the system may use a strong model for trivial work, a weak model for judgment-heavy work, or a single model’s answer as if it were independent verification.

Suggested

Define model roles instead of only model names. Exact change: add a “Model role map” section to SOUL.md or the Hermes operator runbook with roles such as primary reasoning model, cheap drafting model, coding reviewer, research scout, judge/critic, and fallback model; each role should include when to use it and when not to use it.
Add multi-model judging for high-impact decisions. Exact change: patch the evaluation or architecture review prompt with: “For provider changes, agent architecture, public dashboard recommendations, and workflow design, get one independent judge pass from a different model family or provider before final recommendation.”
Track model routing outcomes in a public-safe maturity note. Exact change: add a dashboard or runbook entry named “Model routing verification” that records only generic outcomes such as cost, latency, task fit, and reliability patterns, while excluding prompts, private outputs, credentials, raw eval logs, and user-specific data.

Impact

This makes model diversity useful instead of decorative. Hermes gains better resilience because critical recommendations are not dependent on one provider, one model family, or one reasoning style. It also improves cost control: cheap or fast models can handle routine drafting and scouting, while stronger models are reserved for judgment, planning, and verification.

Effort

Medium — the change is mostly prompt and runbook work, but it requires one careful pass over existing model usage and a small habit change for high-impact reviews.

Public page note

Safe public content includes the maturity principle, generic model roles, public-safe judging patterns, and non-sensitive routing guidance. Internal-only content includes exact provider keys, private prompts, raw model outputs, eval transcripts, user chat content, credentials, billing data, config values, and any live operational routing details.