Motivation
Agents that hardcode model choices cannot adapt to cost constraints, latency requirements, or governance policies. As model providers and capabilities evolve, every agent would require modification.
The Model Router Pattern introduces a routing layer between agents and model providers. Routing decisions are governed, scored, persisted, and auditable. Agents simply declare what they need — the router decides which model serves that need.
Structure
- Agent calls llm_invoke(config, InferenceRequest)
- InferenceRequest carries: prompt, task_type, sensitivity, latency_budget, cost_profile
- ModelRouter scores all catalog models against the request signals
- Best-scoring model is selected
- Routing decision (model, scores, latency) persisted to state store
- LLM response returned to agent