Chapter 17: Model Selection Strategy
Learning Objectives
Assign models by role to avoid "burning cash with all Opus" or "getting stuck with all Sonnet."
Three-Tier Comparison (Data as of May 2026)
| Model | Input | Output | Context | Strengths | Weaknesses |
|---|---|---|---|---|---|
| Haiku 4.5 | $1/MTok | $5/MTok | 200k | Extremely fast, cheap | Weak at complex reasoning |
| Sonnet 4.6 | $3/MTok | $15/MTok | 1M | King of cost-effectiveness | Limited for top-tier architectural thinking |
| Opus 4.7 | $5/MTok | $25/MTok | 1M | Deep reasoning, finding inconsistencies | 5x more expensive |
→ Sonnet is approximately 5 times cheaper than Opus. If Sonnet can solve it, don't use Opus.
Role Matching Formula
flowchart TD
Q["What is the primary function of this role?"]
Q -->|Mass execution
Repetitive operations| Sonnet["sonnet
(Cost-effective)"]
Q -->|Deep reasoning
Find inconsistencies| Opus["opus
(Deep thinking)"]
Q -->|Simple classification
Batch processing| Haiku["haiku
(Cheap)"]
Sonnet --> SonnetEx["developer / tester /
e2e-tester"]
Opus --> OpusEx["reviewer / architect /
developer-deep"]
Haiku --> HaikuEx["Log classification / Simple formatting
(Not used in this project)"]
style Sonnet fill:#bbdefb
style Opus fill:#ffccbc
style Haiku fill:#c5e1a5Final Configuration for Our doc2video Project
| Agent | Model | Reason |
|---|---|---|
| developer | sonnet | Repeatedly implements tasks, high volume |
| developer-deep | opus | Stuck and escalated, needs to question spec |
| tester | sonnet | Translating scenarios is a mechanical task |
| tester-deep | opus | Judging if a scenario is testable requires insight |
| e2e-tester | sonnet | Black-box execution of commands, checking output |
| reviewer | opus | Finding code inconsistencies is Opus's strength |
| architect | opus | Cross-team diagnosis requires a global perspective |
Cost Estimation (Based on Actual Project Runs)
A medium-sized team (5 tasks) completes a full cycle:
developer (sonnet) 1-2 rounds → ~$0.30
tester (sonnet) 1-2 rounds → ~$0.20
reviewer (opus) 1-2 rounds → ~$0.50
─────────────────────────────────
Subtotal ~$1.00
One escalation (3 failed rounds → developer-deep):
+ developer-deep (opus) 1 round → ~$0.40
─────────────────────────────────
Total including escalation ~$1.40
→ For our doc2video project, with 13 teams, 61 tasks, and several escalations, we estimate $15~$30 to complete the entire project. Manual development of the same scope would take 1-2 weeks = at least $5000 in labor costs — AI collaboration is 100x+ cheaper.
Escalate to Opus, Don't Start with It
flowchart LR
Bad["All Opus
(5x cost)"] --> BadResult["Every team uses top-tier
Simple tasks also burn cash"]
Good["Default Sonnet
Escalate to Opus when stuck"] --> GoodResult["80% of tasks run cheaply
Only complex problems burn cash"]
style Bad fill:#ffcdd2
style Good fill:#c8e6c9→ This is the escalation mechanism we'll discuss in Chapter 18.
When to Downgrade to Haiku
If you have these types of light tasks:
✅ Classify logs into ERROR/WARN/INFO
✅ Translate variable names to camelCase
✅ Simple schema validation
Consider using Haiku for micro-agents. Our doc2video project doesn't have these types of tasks — so we didn't use Haiku.
Anti-Patterns
❌ All Opus: "To ensure quality with the strongest model"
→ Wastes 80% of money on tasks where Sonnet is perfectly sufficient
❌ All Sonnet: "To save money"
→ Gets stuck on complex problems, ends up burning more tokens trying repeatedly
❌ Using Sonnet for the reviewer
→ Reviews easily become LGTM, fails to find subtle inconsistencies
❌ Using Opus for the developer
→ High volume, repetitive runs, wastes expensive model
What You Can Do Now
- Assign models to each role in your own project
- Estimate total project costs
- Understand why "escalate rather than start with top-tier"
The next chapter will clarify the "escalation" mechanism — an advanced form of multi-agent autonomy.