final ready Current free model
NVIDIA: Nemotron 3 Super (free)
nvidia/nemotron-3-super-120b-a12b:free
Denne side er genereret fra Hermes LLM knowledge registry. Den viser status, research, test-evidens og næste testplan. Free-status er midlertidig metadata, ikke sidens formål.
Capabilities
include_reasoningmax_tokensreasoningresponse_formatseedstructured_outputstemperaturetool_choicetoolstop_p
Input modalities: text
Recommended use cases
- Reasoning/planning tasks with explicit constraints and step checks
- Research synthesis, source comparison, and analyst/critic work
- Structured extraction or validated JSON outputs, with parser validation
- Tool-using agent tasks with evidence-grounded final answers
Skills and prompt patterns
- Mini-skill: use tools before claims, cite source/tool IDs, do not invent results, no production authority
- Reasoning guardrail: final answer must separate evidence, assumptions, and recommendation
- Research skill: cite sources, separate source evidence from inference, flag missing context
Best practices
- Use external source evidence before final recommendation; keep local eval evidence authoritative for behavior.
- Start with low-temperature deterministic tests before creative tasks
- Log provider, returned model, latency, status, score, prompt/skill version, and redacted errors
- Persist every result immediately, including bad outcomes
- Do not test capabilities that catalog/provider metadata says are unsupported
- Prefer native tool/function calling over brittle strict JSON when tool support exists
- Validate structured outputs with an external parser; do not trust raw format claims
Test evidence
Status counts
{"200": 191}
Provider counts
{"Nvidia": 189, "unknown": 2}
Bad signals
{"empty_output": 2}
Recent records
| Lane | Scenario | Status | Provider | Score | Signal |
| scenario_battery | t1_smoke_danish_exact | 200 | Nvidia | 100.0 | |
| scenario_battery | t1_smoke_danish_exact | 200 | Nvidia | 100.0 | |
| scenario_battery | t1_smoke_danish_exact | 200 | Nvidia | 100.0 | |
| optimal_tools_skills | optimal_policy_dry_run | 200 | Nvidia | 100.0 | |
| optimal_tools_skills | optimal_research_synthesis | 200 | Nvidia | 100.0 | |
| optimal_tools_skills | optimal_file_diagnosis | 200 | Nvidia | 100.0 | |
| scenario_battery | t1_smoke_danish_exact | 200 | Nvidia | 100.0 | |
| scenario_battery | t1_smoke_danish_exact | 200 | Nvidia | 100.0 | |
Next test plan
researchsmokemodel_specific_optimaltools_skillsworkflow_langgraph_mockfinal_recommendation_pagevalidator_loopstructured_output_validationreasoning_planningresearch_synthesis
No skipped lanes recorded.
Research sources
External source enrichment present: yes
Recommendation status
final ready: Ready for public final recommendation for the listed roles, subject to normal availability monitoring.
Recommended roles
research_synthesisstructured_agent_tasksvalidator_or_critictool_using_agent_candidateprimary_candidate_for_sandbox_retest
Risks and caveats
- reviewed_soft: empty_output (2)
- policy: free_status_is_transient (1)