Chapter 30: Advanced - Teams, CI, Extension
Learning Objectives
Apply this workflow to other scenarios: multi-person teams, CI, and new roles.
Multi-person Collaboration
flowchart TB
Repo["Git Repository"] --> Shared["Team Shared (Committed to Git)"]
Repo --> Personal["Personal (gitignore)"]
Shared --> Sh1["CLAUDE.md"]
Shared --> Sh2[".claude/agents/"]
Shared --> Sh3[".claude/commands/"]
Shared --> Sh4[".claude/hooks/"]
Shared --> Sh5[".claude/settings.json"]
Shared --> Sh6["openspec/"]
Personal --> P1[".claude/settings.local.json"]
Personal --> P2[".claude/telegram-notify.json"]
Personal --> P3[".claude/.notify-sent"]
style Shared fill:#c8e6c9
style Personal fill:#fff9c4→ Shared files are committed to Git, personal files are gitignored. Every new team member gets the same roles + rules immediately after cloning.
Multi-person Collaboration State Files
review/N.md Committed to Git (review traces have audit value)
test-reports/N.md Committed to Git
STUCK.md Committed to Git
e2e-report.md Committed to Git
.notify-sent gitignore (personal state)
dist/ gitignore (runtime artifacts)
→ If A leaves work halfway through a run, B can git pull the next day, see review/N.md, and continue the run.
CI Integration
GitHub Actions Example:
# .github/workflows/test.yml
name: tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- run: pip install -e ".[dev]"
- run: pytest tests/ -v -m "unit or functional"
# Does not run e2e (requires macOS GUI)
→ CI runs unit + functional tests, but not E2E.
What CI Runs and Doesn't Run
| Test Type | CI Runs | Local Runs |
|---|---|---|
| Unit Tests | ✅ | ✅ |
| Functional Tests (real subprocess) | ✅ Linux runner | ✅ |
| E2E (macOS GUI) | ❌ | ✅ |
| Spec Consistency Check | ✅ (openspec validate) | ✅ |
Full CI Workflow:
flowchart TB
Push["Developer git push"] --> Trigger["GitHub Actions Trigger"]
Trigger --> Setup["Linux runner
setup Python 3.11"]
Setup --> Install["pip install -e .[dev]"]
Install --> Spec["openspec validate
(Spec Consistency)"]
Install --> Unit["pytest -m unit"]
Install --> Func["pytest -m functional"]
Spec --> Pass{All Passed?}
Unit --> Pass
Func --> Pass
Pass -->|Yes| Green["✓ PR Status Green"]
Pass -->|No| Red["✗ Block Merge"]
Green --> Local["Locally Run E2E"]
Local --> Approve["Human Review + Merge"]
Note["⚠️ E2E cannot run on CI
(macOS avfoundation)"] -.-> Trigger
style Green fill:#c8e6c9
style Red fill:#ffcdd2
style Approve fill:#bbdefbExtending Roles
Add agents according to your project needs:
.claude/agents/
├── developer.md Core
├── tester.md Core
├── reviewer.md Core
├── e2e-tester.md Core
├── architect.md Core
├── doc-writer.md Optional: Sync README/CHANGELOG updates
├── security-reviewer.md Optional: Scan for vulnerabilities
├── refactor-agent.md Optional: Long-term refactoring
├── migration-agent.md Optional: Database schema
└── perf-tester.md Optional: Performance regression
Each new role follows the "permission matrix + 4-part system prompt".
Scenarios Not Suitable for This Workflow
✗ 1-hour one-off scripts
→ Overhead far outweighs benefits
✗ Major refactoring of an already entrenched legacy system
→ First, refactor in small steps to create spec-writable interfaces
✗ Teams tightly coupled with Jira/Notion
→ Dual system conflict, choose one first
✗ Exploratory prototypes (playing around in Jupyter notebook)
→ Specifications will slow down experimentation
✗ Pure UI / design-driven projects
→ Specs are difficult to write with testable Scenarios
Real Value Scenarios for This Workflow
✅ Medium-sized tools/libraries (like doc2video)
✅ SaaS business logic (compliant, auditable)
✅ Long-term maintenance projects (>6 months)
✅ Multi-person collaboration
✅ Taking over projects from others
✅ AI-intensive development (>50% code written by AI)
Future Compatibility
Claude Code is evolving towards "native multi-agent" capabilities (see changelog 2.1.x series). However:
✅ Spec-driven is product-agnostic—OpenSpec can work with any agent system
✅ Role division principles are product-agnostic—any orchestrator can reuse them
✅ File-based state machine—does not depend on specific agent runtime
⚠️ Specific agent file formats may change—but migration cost is low (rewriting system prompts)
⚠️ Slash command format may change—but pseudocode logic is portable
→ The core methodology is transferable. Specific syntax will evolve with Claude Code upgrades.
What You Can Do Now
- Adopt this workflow for your team (gitignore + sharing)
- Configure CI to run tests (excluding E2E)
- Add new role agents as needed
- Determine which projects are not suitable for this workflow
🎓 Conclusion: What You Can Now Do
flowchart TB
Start["You"] --> S1["✓ Structure requirements with OpenSpec"]
Start --> S2["✓ Design a multi-role agent system"]
Start --> S3["✓ Write CLAUDE.md to make main Claude follow rules"]
Start --> S4["✓ Automate the entire development pipeline with /dev"]
Start --> S5["✓ Use hooks to prevent dangerous operations + proactive notifications"]
Start --> S6["✓ Run a complete cycle from idea to ship"]
Start --> S7["✓ Independently diagnose when stuck"]
S1 & S2 & S3 & S4 & S5 & S6 & S7 --> End["From Zero to Autonomous Development Pipeline"]
style End fill:#c8e6c9Complete Capability List
Knowledge Layer (Part III):
✓ Write the proposal / design / spec / tasks quartet
✓ Testable format for Requirement + Scenario
✓ Delta operations (ADDED/MODIFIED/REMOVED/RENAMED)
Governance Layer (Part IV~VI):
✓ 4+ role agent design
✓ Escalation chain + architect as fallback
✓ CLAUDE.md Project Constitution
✓ File-based state machine
✓ /dev orchestration commands
✓ Permission + hook safety guardrails
✓ Stop hook notification integration
Practice (Part VII):
✓ Startup checklist
✓ Three major debugging tools
✓ Sandbox selection
✓ Teams / CI / Extension
Next Steps
- Immediately: Run a complete cycle on your own project, applying what you've learned in each chapter.
- In a week: Write your own CLAUDE.md, adding rules unique to your project.
- In a month: Extend role agents to adapt the pipeline to your domain.
- In three months: Teach this workflow to others (teaching is the best way to learn).
📎 Appendix A: 20 Q&A
The following 20 questions come from common beginner queries. Each answer is 100-200 words, providing a conclusion + brief reasoning.
About Philosophy (Q1-Q5)
Q1. Why not let a single Claude write all the code?
Simple: checks and balances. If one agent writes code, writes tests, and reviews its own code—it will write "just-enough tests to pass its own code" and give itself "favorable reviews." Three months later, you won't be able to trace the issues. Multi-roles = natural code review + test-driven development + black-box verification. Each role has an independent perspective and clear responsibilities, checking each other. This is like human organizations: developers don't review their own code, QA doesn't write product code, and tech leads oversee the big picture—division of labor isn't inefficiency, it's quality infrastructure.
Q2. What is the fundamental difference between OpenSpec and Notion / Confluence?
OpenSpec is "compiled"—specs are the system's current committed contracts, and changes are migrations (each change includes a proposal + design + tasks + deltas). Notion is "snapshot-based"—just text records, without version alignment. Specific differences: (1) OpenSpec has a delta mechanism, Notion writes full versions; (2) OpenSpec is in the same Git repository as the code, with traceable commit associations, while Notion is a separate system that can drift; (3) OpenSpec specs have machine-readable schemas (Requirement/Scenario) that AI can understand, Notion is free-form text. In short: OpenSpec is like database schema migration, Notion is like a Word document.
Q3. I'm already using Cursor / Copilot, do I still need this workflow?
They don't conflict; they solve problems at different layers. Cursor / Copilot provide "real-time assistance"—they complete, explain, and fix bugs as you write code, acting as in-IDE pair programming. This workflow is "batch autonomous"—you define specs and roles, and multi-agents run the full cycle autonomously, operating at a PM + team level of abstraction. The two can be combined: you use Cursor for details, and let /dev handle large chunks of work. If you only use Cursor, you'll be missing a layer when you "want to spend a week building a complete feature where requirements need to evolve long-term"—that layer is OpenSpec + multi-agent.
Q4. What scale is this workflow suitable for?
Minimum: 1-person MVP, but the project must have "long-term evolution potential"
1-hour one-off scripts are not worth it
Maximum: Tried with projects up to 10 people
Larger projects require splitting into multiple OpenSpec repositories
Optimal: 1-3 people, 3 weeks to 6 months medium-sized projects (like doc2video)
Manually managing the details of 50+ tasks will lead to collapse; once the tools are set up, the marginal cost is extremely low. Not suitable for: pure 1-hour exploration / major refactoring of an already entrenched legacy system / teams tightly coupled with Jira/Notion (dual system conflict).
Q5. Are multi-agents very token-intensive? Will the bill explode?
It will be more expensive than a single Claude but controllable. Our doc2video project, with 13 groups and 61 tasks, is estimated to cost $15-$30 to run—the same scope manually would take 1-2 weeks of labor, costing thousands of dollars, so the ROI is extremely high. Control methods: (1) Default to Sonnet, escalate to Opus only when stuck (escalation tier, saves 5x); (2) Threshold circuit breakers prevent deadlocks; (3) Prompt caching automatically takes effect (repeatedly read specs are cached); (4) Immediately archive raw/ after a group completes, preventing context from accumulating indefinitely. Run one group at the beginning of the month to observe actual costs, calibrate expectations, then let it run.
About Practice (Q6-Q12)
Q6. Will agents "fight" each other?
Yes, but there are built-in defenses. The most common conflict: reviewer rejects, developer modifies, reviewer rejects new issues again—a tug-of-war. Two mechanisms prevent this: (1) F2 cumulative review rules—from the second round onwards, the reviewer only re-reviews the previous Required Changes, and cannot raise new issues (unless the previous fix introduced new problems); (2) Deadlock threshold—3 rejections automatically escalate to developer-deep to re-question assumptions, and if still stuck, escalate to architect for diagnosis. Ultimately, human intervention is needed approximately once every 5-10 groups, far less frequently than a fully manual process.
Q7. How do I add this workflow to my existing codebase?
Do not go back and fill in specs. Specific steps: (1) openspec init + commit; (2) Write a specs/ that describes current capabilities in reverse—only covering parts you are confident in, not exhaustive; (3) All subsequent new features/refactorings follow the change process (propose → apply → archive); (4) Specs accumulate naturally through archiving, do not actively backfill. "Hard-filling specs" for old code is a bottomless pit—they weren't written with testability in mind, and reverse-engineering Scenarios is extremely painful. Let specs evolve with new work.
Q8. How long does a change take to complete?
Roughly estimated by the number of tasks:
| Scale | Number of tasks | Time |
|---|---|---|
| Small | 5~10 | 30 minutes~1 hour |
| Medium | 20~30 | 2~4 hours |
| Large | 50+ | Half a day~one day |
doc2video has 61 tasks, estimated to complete the MVP in half a day. Variables: network quality, number of escalations, frequency of human intervention. The first group is usually twice as slow (environment setup); subsequent groups will pick up speed. If a change is estimated to take more than 2 days, split it—OpenSpec changes are best experienced when kept within 1 day.
Q9. Can code written by agents be shipped directly?
Technically yes, but in practice, the final gate is you. The reviewer + tester + e2e-tester already block 90% of issues, but before shipping to production, you should: (1) Run git diff and review it yourself; (2) Run it on staging once; (3) Check review/N.md for any places the reviewer marked "I'm not sure." Multi-agents block "obviously wrong" and "clearly non-compliant" issues; the remaining 10% are "style, taste, business intuition"—this is the human domain. Treat it as a trusted junior team—they can get work done, but the lead should take a look before shipping.
Q10. If it stops halfway, will context be lost?
No. The state is entirely in files (a core design of CLAUDE.md—see Ch 20). Specifically recoverable states: (1) Checkbox progress in tasks.md; (2) Round count and Required Changes in review/N.md; (3) PASS/FAIL in test-reports/N.md; (4) Diagnosis in STUCK.md. After restarting Claude Code, running /dev status will immediately show where it left off. Even if you /clear the main conversation or switch devices, all information is restored from files. This is why we firmly believe in a "file-based state machine"—in-memory state can be lost, files cannot.
Q11. If CLAUDE.md is changed, does it need to be restarted?
It takes effect only in new sessions. The CLAUDE.md loaded in the current session is not automatically re-read—unless you /clear or restart Claude. Best practice: After modifying CLAUDE.md, immediately start a new session to verify. For major rule changes (role scheduling, state machine), a restart is essential. For minor rule changes (command cheat sheet), the current session might still work. How to check: Run /dev status and see if main Claude interprets the state according to the new rules—if it does, it has truly read them.
Q12. Can different projects share agents?
Yes, in two layers:
- User-level (
~/.claude/agents/<name>.md): Follows your account, available to all projects—suitable for generic agents likedoc-writer,refactor-agent,security-reviewer. - Project-level (
<project>/.claude/agents/): Follows the project, shared by the team—suitable for project-specific agents.
Anti-pattern: Putting project-specific rules at the user level—resulting in unexpected errors in other projects. Recommendation: Put generic skeletons at the user level, and project-specific constraints at the project level + override them in project agents (manually copy necessary content).
About Design (Q13-Q17)
Q13. Is it wrong if agent files get longer and longer?
Yes. Over 200 lines usually indicates one of two problems:
- Agent responsibilities are too broad—split into multiple agents. Example:
developermanages both "writing code + configuring CI," split outci-agent. - Writing what main Claude should write—e.g., putting the entire state machine into
developer.md, which should be moved toCLAUDE.md.
Criterion: Can you remove this sentence without affecting the agent's behavior? If yes → delete. Agent files ideally range from 60-150 lines; exceeding this means you should review for redundancy.
Q14. Why use Opus for the reviewer, can't Sonnet work?
It can, but the effect is significantly worse. Opus's strength lies in "identifying subtle inconsistencies that others miss"—a core competency for code review. Sonnet's reviews often devolve into "LGTM" (Looks Good To Me)—if it looks okay on the surface, it passes. It fails to catch nuanced issues like: a D5 decision being violated but the code still appearing correct, a Scenario being silently weakened, or a dependency being added without justification. Our tests show that Sonnet reviewers miss issues that Opus identifies immediately. Since reviewer frequency is low (1-2 times per group), the cost impact of Opus is minimal, but the quality ROI is huge. This is the opposite of the Chapter 17 "Escalation Tier" strategy—the reviewer is a role where Opus should be the default.
Q15. Can I let agents directly commit + push to main?
Technically possible (via permission allow), but strongly discouraged. git push is an "external side effect" that immediately affects production, CI, and other team members. Pushing should be the final human-approved step. The recommended workflow: let the developer stop at commit (local trace created but not pushed); humans perform PR review and push. When /dev completes a group, have it notify you via Telegram ("Group N Passed, Ready to Push")—you take a quick look and manually push. This one-second human gate prevents 99% of accidents. Giving agents push permission is like handing the steering wheel to the passenger.
Q16. Will too many "permission deny" rules cause agents to get stuck?
They won't "hang"; they will report an error and request correction. When a deny rule is hit, Claude Code displays the reason for refusal. Main Claude receiving the error will say, "I tried X but was blocked; I suggest using Y instead." The downside is the interruption of your flow. Best strategy: (1) Start with a conservative deny list and observe blocked records over time; (2) Add exceptions for valid but blocked operations; (3) Excessive deny rules lead to a poor autonomy experience. Our doc2video deny list was fine-tuned through iteration—we had a few false positives initially, but it became smooth after adding exceptions. Anti-pattern: Disabling a whole rule just because it blocked something once—that's like turning off the safety net.
Q17. What happens if a Stop hook fails? Does it affect Claude Code?
It does not affect the main process. If a hook exits with a non-zero code, Claude Code displays a warning but continues working—your turn completes as intended, you just might miss the notification. In the worst case, you miss a Telegram push and have to check the transcript manually. This is by design: Anthropic designed hooks as "enhancements rather than blockers"—a hook crash shouldn't cause a user to lose work. Anti-pattern: Putting critical process logic in a hook ("next step depends on hook success")—a network glitch could then stall your deployment. Hooks should only perform "auxiliary actions" (notifications, logging, monitoring) and never be in the critical path.
About Extension (Q18-Q20)
Q18. Can I share settings.json across different git repositories?
Not at the project level (as it lives in .claude/ within the project). Sharing methods:
- User Level: Put general parts (default model, theme) in
~/.claude/settings.json; all projects inherit from here. - Configuration Templates: Maintain a
claude-template/repository andcp -r template/.claude/ ./.claude/for new projects. - Skills / Plugins: Package common agents/hooks as skills or plugins for distribution.
- Dotfile Sync: Use tools like
chezmoioryadmto synchronize~/.claude/.
Key Principle: Share the general, projectize the specific. A fully cross-repository settings.json doesn't exist by design, to prevent a single point of failure from affecting all projects.
Q19. Will multiple agents modifying the same file cause conflicts?
Our design does not allow concurrent dispatch for the same group—each group is processed serially. CLAUDE.md explicitly states that dispatch is per group and groups are serial. Cross-group parallelism is theoretically possible but requires:
- Files modified by the two groups to be entirely non-overlapping (inferred from
tasks.md). - Orchestrator (
dev.md) to explicitly enable a parallel mode. - Robust rollback mechanisms for failures.
Parallelism is disabled for the MVP—multi-agent systems are already complex enough; adding concurrency is a debugging nightmare. Wait until there's a genuine need (e.g., a 30+ group project on a tight deadline). For doc2video, 13 groups running serially finished the MVP in half a day, so parallelism wasn't necessary.
Q20. Will this workflow be built into Claude Code eventually? Is my current effort wasted?
Parts of it will likely be internalized (Claude Code is evolving towards "native multi-agent" with infrastructures like EnterWorktree and subagent in the 2.1.x changelogs), but the core methodology will not become obsolete:
- The orchestration layer might be simplified, and custom
/devcommands might become standard features. - Spec-driven development is tool-neutral—OpenSpec works with any agent system, and the methodology is portable.
- Principles of role division, escalation tiers, and file-based state machines are software engineering patterns that are cross-tool universal.
Conclusion: Today's tool configuration might need a rewrite in a year, but the methodology in your mind will still apply in five. The investment pays off.
📎 Appendix B: Glossary
Organized by category, each entry provides a 1-2 sentence definition and references the chapter for details.
OpenSpec Terminology
| Term | Definition | Details |
|---|---|---|
| Capability | A capability domain, corresponding to one specs/<name>/spec.md file |
Ch 7, 12 |
| Spec | The contract document (specs/) currently committed by the system, accumulated and merged from all archived changes |
Ch 7 |
| Change | An ongoing modification, including the 4-artifact set: proposal/design/specs/tasks | Ch 7, 10 |
| Archive | Storage for completed changes, preserving full history | Ch 7 |
| Proposal | The "Why + What" of a change, 1-2 pages | Ch 10 |
| Design | The "How" of a change, recording technical decisions and trade-offs | Ch 11 |
| Requirement | A system promise, using SHALL/MUST phrasing | Ch 12 |
| Scenario | A specific behavior for a Requirement, in WHEN/THEN format, marked with four # | Ch 12 |
| Delta | Spec changes within a change (ADDED/MODIFIED/REMOVED/RENAMED) | Ch 12 |
| Task | A - [ ] item in tasks.md that can be checked off by developers |
Ch 13 |
| Group | A task group defined by ## N. Name in tasks.md; the smallest unit of dispatch |
Ch 13 |
| Active change | The change currently being implemented (the one where archive is marked false) |
Ch 6, 21 |
Claude Code Terminology
| Term | Definition | Details |
|---|---|---|
| main Claude | The primary Claude Code process you interact with, acting as the dispatcher | Ch 3, 14 |
| subagent | A dispatchable agent defined via .claude/agents/<name>.md |
Ch 14 |
| Skill | A standardized workflow callable by main Claude (e.g., /opsx:propose) |
Ch 6, 14 |
| Slash command | A prompt template triggered by user input starting with / |
Ch 21 |
| CLAUDE.md | The "Project Constitution" in the root directory, read automatically by main Claude at startup | Ch 19 |
| Hook | An external script triggered by Claude Code before or after tool calls | Ch 25, 26 |
| PreToolUse hook | A hook triggered before tool calls, capable of interception | Ch 25 |
| Stop hook | Triggered after main Claude completes a block of output, often used for notifications | Ch 26 |
| Permission mode | default / acceptEdits / bypassPermissions |
Ch 24 |
$CLAUDE_PROJECT_DIR |
Environment variable for the project root, commonly used in hooks | Ch 25 |
| MCP | Model Context Protocol, for connecting to external services like Telegram or Gmail | Ch 26 |
| Frontmatter | YAML metadata at the top of agent or command files | Ch 16, 21 |
Multi-Role and Orchestration Terminology
| Term | Definition | Details |
|---|---|---|
| Dispatcher | The orchestrator role, the identity of main Claude in a multi-agent system | Ch 19 |
| Doer | The executor, opposite to the dispatcher—main Claude must avoid being a doer | Ch 19 |
| Briefing | Self-contained instructions attached to a subagent during dispatch | Ch 14, 21 |
| Round | The number of iterations for the same group/agent (e.g., reviewerRound, testFailRound) |
Ch 18 |
| Threshold | The limit for infinite loops (e.g., upgrade after 3 rejections, architect after 5 test failures) | Ch 18 |
| Escalation | Automatic upgrade to a more powerful agent (Sonnet → Opus) when stuck | Ch 18 |
| Tier | Escalation levels: Developer / Developer-Deep / Architect | Ch 18 |
| Marker | Fixed prefixes in output (⚠️ / ✓ / ✗) for hooks to identify significant events | Ch 21, 26 |
| Path A | The path where a deep agent identifies spec flaws without writing code | Ch 18 |
| Path B | The path where a deep agent reworks using a structurally different solution | Ch 18 |
| STUCK.md | A diagnostic report by the architect containing root cause + 3 options | Ch 18 |
| State machine | File-based state machine (PENDING/DEVELOPING/TESTING/REVIEWING/...) | Ch 20 |
| Crash-safe | State stored in files, enabling recovery after process crashes | Ch 20 |
The 4 Roles
| Role | Responsibility | Model | Details |
|---|---|---|---|
| developer | Implements a specific group from tasks.md |
Sonnet | Ch 15, 16 |
| developer-deep | Upgraded version of the developer (used when stuck) | Opus | Ch 18 |
| tester | Writes unit and functional tests | Sonnet | Ch 15 |
| tester-deep | Upgraded version of the tester | Opus | Ch 18 |
| e2e-tester | Black-box end-to-end testing (forbidden from reading src/) |
Sonnet | Ch 15 |
| reviewer | Reviews against spec/design | Opus | Ch 15, 17 |
| architect | Final diagnostic fallback (does not write code) | Opus | Ch 18 |
Security and Permission Terminology
| Term | Definition | Details |
|---|---|---|
| Allow rule | A rule where matches are automatically approved | Ch 24 |
| Deny rule | A rule where matches are rejected (takes precedence over allow) | Ch 24 |
| Bypass mode | bypassPermissions—skips prompts, relying on deny rules and hooks for safety |
Ch 24 |
| Path-aware check | Path-sensitive validation (e.g., allow rm within project, block outside)—requires hooks |
Ch 25 |
| Bind mount | Docker mounting host directories into containers for file sharing and process isolation | Ch 29 |
| Sandbox tier | Levels: L1 (worktree+venv) / L2 (Docker) / L3 (VM) | Ch 29 |
Project-Specific (doc2video) Terminology
| Term | Definition |
|---|---|
| doc2video | Case project—Compiling Markdown tutorial documents into videos |
| edge-tts | Microsoft Edge browser's TTS service (free), generates mp3 voiceovers |
| avfoundation | macOS built-in audio-video framework, used by ffmpeg for screen recording |
| tmux | Terminal multiplexer controllable by scripts, used for real command execution |
:::manual |
Markdown extension for doc2video, marking purely narrative steps (no terminal commands) |
| TTS | Text-to-Speech |
| Headless browser | Browser without a GUI, used for automated testing |
| Marker (Command Completion) | Unique strings wrapping tmux send-keys for reliable command completion detection |
📎 Appendix C: Comparison with Other Tools
Tooling Landscape
flowchart LR
subgraph IDE["In-IDE Real-time Assistance"]
Cursor["Cursor"]
Copilot["GitHub Copilot"]
Continue["Continue"]
end
subgraph CLI["CLI Editor Style"]
Aider["Aider"]
Cline["Cline"]
end
subgraph Agent["Autonomous Agents"]
Devin["Devin"]
Sweep["Sweep AI"]
ThisStack["This Stack
(Claude Code+OpenSpec+Multi-Agent)"]
end
subgraph Chat["Pure Chat"]
ClaudeWeb["Claude.ai"]
ChatGPT["ChatGPT"]
end
style ThisStack fill:#c8e6c9Comparison Matrix
| Dimension | Pure Claude.ai | Cursor | Copilot | Aider | Devin | Cline | This Stack |
|---|---|---|---|---|---|---|---|
| Intervention | Dialogue | In-IDE Real-time | In-IDE Completion | CLI Editor | Full Autonomy | CLI Autonomy | CLI Orchestration + Multi-Agent |
| Context Mgmt | Manual | File Aware | Current File | Git Aware | Task Mgmt | Project Level | OpenSpec Spec |
| Multi-Agent | ❌ | ❌ | ❌ | ❌ | Partial (Internal) | Single Agent | ✅ 4-7 Roles |
| Spec Support | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ OpenSpec |
| Autonomy | Low | Medium | Low | Medium | Extremely High | High | High (Adjustable) |
| Perms/Sandbox | n/a | IDE Restricted | IDE Restricted | Git Ops | Internal Sandbox | Config | Configurable Hooks |
| Recoverable | ❌ | Limited | ❌ | Git | Internal | Limited | ✅ File State Machine |
| Cost | Monthly | Monthly | Monthly | Per Token | $$$ / Task | Per Token | Per Token |
| Learning Curve | Extremely Low | Low | Extremely Low | Medium | Low (Black box) | Medium | High |
| Best For | Quick Consult | Daily Coding | Completion | Existing Git Projects | Full Task Delegation | CLI Users | Medium-Long Term Projects |
Detailed Comparison
vs Cursor
Cursor: In-IDE "Companion"—real-time suggestions as you write.
This Stack: "Team"—you design goals, it finishes the job.
Not conflicting; use both. Cursor handles "how to write this line," and this stack handles "the feature to be built this week."
vs GitHub Copilot
Copilot: Completion level (the next line at the cursor).
This Stack: Project level (spec → implementation → test → review).
Copilot doesn't see project intent, only local context. This stack gives the full "map" via OpenSpec to the agent.
vs Aider
Aider: Single agent, CLI operating on git, one task at a time.
This Stack: Multi-agent, orchestrator, long-running pipeline.
Aider is suited for "I want to fix this bug in this PR." This stack is for "I want to build a new feature + long-term evolution."
vs Cline / Roo (Claude client within VS Code)
Cline: IDE-ified Claude Code, single agent + permissions.
This Stack: Built on Claude Code, adds OpenSpec + multi-agent + escalation.
If you use Cline, you can transplant 70% of these ideas—Cline also supports hooks and custom prompts. The difference is that the OpenSpec toolchain requires Claude Code's skill system for full completion.
vs Devin (autonomous agent)
Devin: Black box—you give a task, it decomposes, executes, and reports.
This Stack: White box—every step, decision, and state is visible and editable in files.
Devin is like outsourcing to a sealed engineer. This stack is like building your own controllable team. When Devin goes out of control, you can only watch the logs; when this stack goes out of control, you can modify the spec, agent, or hook.
vs Sweep AI (GitHub auto PR)
Sweep: Issue → Black-box PR generation.
This Stack: Requirement → Spec → Multi-agent implementation → Review → PR.
Sweep is for "help me fix this issue," a one-off. This stack is for "I want to run the whole project this way long-term."
vs Pure Claude Code (no OpenSpec, no multi-agent)
Pure Claude Code: You talk to a single Claude; it does everything.
This Stack: Adds OpenSpec (Specification Layer) + Multi-Agent (Governance Layer).
This is the core comparison of the tutorial. Pure Claude Code hits the three pain points of Chapter 1 in medium-sized projects—requirement drift, self-certification traps, and permissive permissions. This stack adds three layers of infrastructure to pure Claude Code.
Selection Guide
flowchart TD
Q1{How long is the project?}
Q1 -->|1 Hour| Chat["Claude.ai Dialogue"]
Q1 -->|1 Day to 1 Week| IDE["Cursor / Copilot"]
Q1 -->|Over 1 Month| Q2{Long-term Evolution?}
Q2 -->|No| Aider["Aider"]
Q2 -->|Yes| Q3{Team Collaboration Needed?}
Q3 -->|No| Cline["Cline / Claude Code Single Agent"]
Q3 -->|Yes| ThisStack["✓ This Stack"]
style ThisStack fill:#c8e6c9Stacking Rather Than Choosing
In an actual workflow, it's likely like this:
Daily coding (at the cursor) → Cursor
Daily commits / small fixes → Aider occasionally
Feature level (a new feature) → This stack (OpenSpec + Multi-Agent)
Quick consult / learning → Claude.ai dialogue
No conflict—they solve problems at different abstraction layers.
🎯 Tutorial Complete
If you've successfully run doc2video, congratulations—you've mastered the most advanced AI collaborative development methodology of 2026.
flowchart LR
Read["Read 30 Chapters"] --> Try["Run doc2video"]
Try --> Adapt["Adapt to Your Project"]
Adapt --> Teach["Teach Colleagues"]
Teach --> Master["Truly Master It"]
style Master fill:#c8e6c9Feedback and Contributions
- Found errors / Suggest improvements: Open an issue in the OpenSpec project repository.
- Ideas / Practice sharing: PRs to join the "Practice Cases" appendix are welcome.
Acknowledgments
This tutorial is derived from the actual experience of building the doc2video project from 0 to 1. All mermaid diagrams, code snippets, and decision matrices come from the real development process—not fictional textbook cases.
Share this document with your colleagues.