Chapter 30 | Advanced: Team, CI, Extensibility

20 MIN READ | UPDATED: 2026-05-15

Chapter 30: Advanced - Teams, CI, Extension

Learning Objectives

Apply this workflow to other scenarios: multi-person teams, CI, and new roles.

Multi-person Collaboration

flowchart TB
    Repo["Git Repository"] --> Shared["Team Shared (Committed to Git)"]
    Repo --> Personal["Personal (gitignore)"]

    Shared --> Sh1["CLAUDE.md"]
    Shared --> Sh2[".claude/agents/"]
    Shared --> Sh3[".claude/commands/"]
    Shared --> Sh4[".claude/hooks/"]
    Shared --> Sh5[".claude/settings.json"]
    Shared --> Sh6["openspec/"]

    Personal --> P1[".claude/settings.local.json"]
    Personal --> P2[".claude/telegram-notify.json"]
    Personal --> P3[".claude/.notify-sent"]

    style Shared fill:#c8e6c9
    style Personal fill:#fff9c4

Shared files are committed to Git, personal files are gitignored. Every new team member gets the same roles + rules immediately after cloning.

Multi-person Collaboration State Files

review/N.md         Committed to Git (review traces have audit value)
test-reports/N.md   Committed to Git
STUCK.md            Committed to Git
e2e-report.md       Committed to Git
.notify-sent        gitignore (personal state)
dist/               gitignore (runtime artifacts)

→ If A leaves work halfway through a run, B can git pull the next day, see review/N.md, and continue the run.

CI Integration

GitHub Actions Example:

# .github/workflows/test.yml
name: tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      - run: pip install -e ".[dev]"
      - run: pytest tests/ -v -m "unit or functional"
        # Does not run e2e (requires macOS GUI)

→ CI runs unit + functional tests, but not E2E.

What CI Runs and Doesn't Run

Test Type CI Runs Local Runs
Unit Tests
Functional Tests (real subprocess) ✅ Linux runner
E2E (macOS GUI)
Spec Consistency Check ✅ (openspec validate)

Full CI Workflow:

flowchart TB
    Push["Developer git push"] --> Trigger["GitHub Actions Trigger"]
    Trigger --> Setup["Linux runner
setup Python 3.11"] Setup --> Install["pip install -e .[dev]"] Install --> Spec["openspec validate
(Spec Consistency)"] Install --> Unit["pytest -m unit"] Install --> Func["pytest -m functional"] Spec --> Pass{All Passed?} Unit --> Pass Func --> Pass Pass -->|Yes| Green["✓ PR Status Green"] Pass -->|No| Red["✗ Block Merge"] Green --> Local["Locally Run E2E"] Local --> Approve["Human Review + Merge"] Note["⚠️ E2E cannot run on CI
(macOS avfoundation)"] -.-> Trigger style Green fill:#c8e6c9 style Red fill:#ffcdd2 style Approve fill:#bbdefb

Extending Roles

Add agents according to your project needs:

.claude/agents/
├── developer.md           Core
├── tester.md              Core
├── reviewer.md            Core
├── e2e-tester.md          Core
├── architect.md           Core
├── doc-writer.md          Optional: Sync README/CHANGELOG updates
├── security-reviewer.md   Optional: Scan for vulnerabilities
├── refactor-agent.md      Optional: Long-term refactoring
├── migration-agent.md     Optional: Database schema
└── perf-tester.md         Optional: Performance regression

Each new role follows the "permission matrix + 4-part system prompt".

Scenarios Not Suitable for This Workflow

✗ 1-hour one-off scripts
   → Overhead far outweighs benefits

✗ Major refactoring of an already entrenched legacy system
   → First, refactor in small steps to create spec-writable interfaces

✗ Teams tightly coupled with Jira/Notion
   → Dual system conflict, choose one first

✗ Exploratory prototypes (playing around in Jupyter notebook)
   → Specifications will slow down experimentation

✗ Pure UI / design-driven projects
   → Specs are difficult to write with testable Scenarios

Real Value Scenarios for This Workflow

✅ Medium-sized tools/libraries (like doc2video)
✅ SaaS business logic (compliant, auditable)
✅ Long-term maintenance projects (>6 months)
✅ Multi-person collaboration
✅ Taking over projects from others
✅ AI-intensive development (>50% code written by AI)

Future Compatibility

Claude Code is evolving towards "native multi-agent" capabilities (see changelog 2.1.x series). However:

✅ Spec-driven is product-agnostic—OpenSpec can work with any agent system
✅ Role division principles are product-agnostic—any orchestrator can reuse them
✅ File-based state machine—does not depend on specific agent runtime
⚠️ Specific agent file formats may change—but migration cost is low (rewriting system prompts)
⚠️ Slash command format may change—but pseudocode logic is portable

The core methodology is transferable. Specific syntax will evolve with Claude Code upgrades.

What You Can Do Now

  • Adopt this workflow for your team (gitignore + sharing)
  • Configure CI to run tests (excluding E2E)
  • Add new role agents as needed
  • Determine which projects are not suitable for this workflow

🎓 Conclusion: What You Can Now Do

flowchart TB
    Start["You"] --> S1["✓ Structure requirements with OpenSpec"]
    Start --> S2["✓ Design a multi-role agent system"]
    Start --> S3["✓ Write CLAUDE.md to make main Claude follow rules"]
    Start --> S4["✓ Automate the entire development pipeline with /dev"]
    Start --> S5["✓ Use hooks to prevent dangerous operations + proactive notifications"]
    Start --> S6["✓ Run a complete cycle from idea to ship"]
    Start --> S7["✓ Independently diagnose when stuck"]

    S1 & S2 & S3 & S4 & S5 & S6 & S7 --> End["From Zero to Autonomous Development Pipeline"]

    style End fill:#c8e6c9

Complete Capability List

Knowledge Layer (Part III):
  ✓ Write the proposal / design / spec / tasks quartet
  ✓ Testable format for Requirement + Scenario
  ✓ Delta operations (ADDED/MODIFIED/REMOVED/RENAMED)

Governance Layer (Part IV~VI):
  ✓ 4+ role agent design
  ✓ Escalation chain + architect as fallback
  ✓ CLAUDE.md Project Constitution
  ✓ File-based state machine
  ✓ /dev orchestration commands
  ✓ Permission + hook safety guardrails
  ✓ Stop hook notification integration

Practice (Part VII):
  ✓ Startup checklist
  ✓ Three major debugging tools
  ✓ Sandbox selection
  ✓ Teams / CI / Extension

Next Steps

  • Immediately: Run a complete cycle on your own project, applying what you've learned in each chapter.
  • In a week: Write your own CLAUDE.md, adding rules unique to your project.
  • In a month: Extend role agents to adapt the pipeline to your domain.
  • In three months: Teach this workflow to others (teaching is the best way to learn).

📎 Appendix A: 20 Q&A

The following 20 questions come from common beginner queries. Each answer is 100-200 words, providing a conclusion + brief reasoning.

About Philosophy (Q1-Q5)

Q1. Why not let a single Claude write all the code?

Simple: checks and balances. If one agent writes code, writes tests, and reviews its own code—it will write "just-enough tests to pass its own code" and give itself "favorable reviews." Three months later, you won't be able to trace the issues. Multi-roles = natural code review + test-driven development + black-box verification. Each role has an independent perspective and clear responsibilities, checking each other. This is like human organizations: developers don't review their own code, QA doesn't write product code, and tech leads oversee the big picture—division of labor isn't inefficiency, it's quality infrastructure.

Q2. What is the fundamental difference between OpenSpec and Notion / Confluence?

OpenSpec is "compiled"—specs are the system's current committed contracts, and changes are migrations (each change includes a proposal + design + tasks + deltas). Notion is "snapshot-based"—just text records, without version alignment. Specific differences: (1) OpenSpec has a delta mechanism, Notion writes full versions; (2) OpenSpec is in the same Git repository as the code, with traceable commit associations, while Notion is a separate system that can drift; (3) OpenSpec specs have machine-readable schemas (Requirement/Scenario) that AI can understand, Notion is free-form text. In short: OpenSpec is like database schema migration, Notion is like a Word document.

Q3. I'm already using Cursor / Copilot, do I still need this workflow?

They don't conflict; they solve problems at different layers. Cursor / Copilot provide "real-time assistance"—they complete, explain, and fix bugs as you write code, acting as in-IDE pair programming. This workflow is "batch autonomous"—you define specs and roles, and multi-agents run the full cycle autonomously, operating at a PM + team level of abstraction. The two can be combined: you use Cursor for details, and let /dev handle large chunks of work. If you only use Cursor, you'll be missing a layer when you "want to spend a week building a complete feature where requirements need to evolve long-term"—that layer is OpenSpec + multi-agent.

Q4. What scale is this workflow suitable for?

Minimum:  1-person MVP, but the project must have "long-term evolution potential"
          1-hour one-off scripts are not worth it
Maximum:  Tried with projects up to 10 people
          Larger projects require splitting into multiple OpenSpec repositories
Optimal:  1-3 people, 3 weeks to 6 months medium-sized projects (like doc2video)

Manually managing the details of 50+ tasks will lead to collapse; once the tools are set up, the marginal cost is extremely low. Not suitable for: pure 1-hour exploration / major refactoring of an already entrenched legacy system / teams tightly coupled with Jira/Notion (dual system conflict).

Q5. Are multi-agents very token-intensive? Will the bill explode?

It will be more expensive than a single Claude but controllable. Our doc2video project, with 13 groups and 61 tasks, is estimated to cost $15-$30 to run—the same scope manually would take 1-2 weeks of labor, costing thousands of dollars, so the ROI is extremely high. Control methods: (1) Default to Sonnet, escalate to Opus only when stuck (escalation tier, saves 5x); (2) Threshold circuit breakers prevent deadlocks; (3) Prompt caching automatically takes effect (repeatedly read specs are cached); (4) Immediately archive raw/ after a group completes, preventing context from accumulating indefinitely. Run one group at the beginning of the month to observe actual costs, calibrate expectations, then let it run.

About Practice (Q6-Q12)

Q6. Will agents "fight" each other?

Yes, but there are built-in defenses. The most common conflict: reviewer rejects, developer modifies, reviewer rejects new issues again—a tug-of-war. Two mechanisms prevent this: (1) F2 cumulative review rules—from the second round onwards, the reviewer only re-reviews the previous Required Changes, and cannot raise new issues (unless the previous fix introduced new problems); (2) Deadlock threshold—3 rejections automatically escalate to developer-deep to re-question assumptions, and if still stuck, escalate to architect for diagnosis. Ultimately, human intervention is needed approximately once every 5-10 groups, far less frequently than a fully manual process.

Q7. How do I add this workflow to my existing codebase?

Do not go back and fill in specs. Specific steps: (1) openspec init + commit; (2) Write a specs/ that describes current capabilities in reverse—only covering parts you are confident in, not exhaustive; (3) All subsequent new features/refactorings follow the change process (propose → apply → archive); (4) Specs accumulate naturally through archiving, do not actively backfill. "Hard-filling specs" for old code is a bottomless pit—they weren't written with testability in mind, and reverse-engineering Scenarios is extremely painful. Let specs evolve with new work.

Q8. How long does a change take to complete?

Roughly estimated by the number of tasks:

Scale Number of tasks Time
Small 5~10 30 minutes~1 hour
Medium 20~30 2~4 hours
Large 50+ Half a day~one day

doc2video has 61 tasks, estimated to complete the MVP in half a day. Variables: network quality, number of escalations, frequency of human intervention. The first group is usually twice as slow (environment setup); subsequent groups will pick up speed. If a change is estimated to take more than 2 days, split it—OpenSpec changes are best experienced when kept within 1 day.

Q9. Can code written by agents be shipped directly?

Technically yes, but in practice, the final gate is you. The reviewer + tester + e2e-tester already block 90% of issues, but before shipping to production, you should: (1) Run git diff and review it yourself; (2) Run it on staging once; (3) Check review/N.md for any places the reviewer marked "I'm not sure." Multi-agents block "obviously wrong" and "clearly non-compliant" issues; the remaining 10% are "style, taste, business intuition"—this is the human domain. Treat it as a trusted junior team—they can get work done, but the lead should take a look before shipping.

Q10. If it stops halfway, will context be lost?

No. The state is entirely in files (a core design of CLAUDE.md—see Ch 20). Specifically recoverable states: (1) Checkbox progress in tasks.md; (2) Round count and Required Changes in review/N.md; (3) PASS/FAIL in test-reports/N.md; (4) Diagnosis in STUCK.md. After restarting Claude Code, running /dev status will immediately show where it left off. Even if you /clear the main conversation or switch devices, all information is restored from files. This is why we firmly believe in a "file-based state machine"—in-memory state can be lost, files cannot.

Q11. If CLAUDE.md is changed, does it need to be restarted?

It takes effect only in new sessions. The CLAUDE.md loaded in the current session is not automatically re-read—unless you /clear or restart Claude. Best practice: After modifying CLAUDE.md, immediately start a new session to verify. For major rule changes (role scheduling, state machine), a restart is essential. For minor rule changes (command cheat sheet), the current session might still work. How to check: Run /dev status and see if main Claude interprets the state according to the new rules—if it does, it has truly read them.

Q12. Can different projects share agents?

Yes, in two layers:

  • User-level (~/.claude/agents/<name>.md): Follows your account, available to all projects—suitable for generic agents like doc-writer, refactor-agent, security-reviewer.
  • Project-level (<project>/.claude/agents/): Follows the project, shared by the team—suitable for project-specific agents.

Anti-pattern: Putting project-specific rules at the user level—resulting in unexpected errors in other projects. Recommendation: Put generic skeletons at the user level, and project-specific constraints at the project level + override them in project agents (manually copy necessary content).

About Design (Q13-Q17)

Q13. Is it wrong if agent files get longer and longer?

Yes. Over 200 lines usually indicates one of two problems:

  1. Agent responsibilities are too broad—split into multiple agents. Example: developer manages both "writing code + configuring CI," split out ci-agent.
  2. Writing what main Claude should write—e.g., putting the entire state machine into developer.md, which should be moved to CLAUDE.md.

Criterion: Can you remove this sentence without affecting the agent's behavior? If yes → delete. Agent files ideally range from 60-150 lines; exceeding this means you should review for redundancy.

Q14. Why use Opus for the reviewer, can't Sonnet work?

It can, but the effect is significantly worse. Opus's strength lies in "identifying subtle inconsistencies that others miss"—a core competency for code review. Sonnet's reviews often devolve into "LGTM" (Looks Good To Me)—if it looks okay on the surface, it passes. It fails to catch nuanced issues like: a D5 decision being violated but the code still appearing correct, a Scenario being silently weakened, or a dependency being added without justification. Our tests show that Sonnet reviewers miss issues that Opus identifies immediately. Since reviewer frequency is low (1-2 times per group), the cost impact of Opus is minimal, but the quality ROI is huge. This is the opposite of the Chapter 17 "Escalation Tier" strategy—the reviewer is a role where Opus should be the default.

Q15. Can I let agents directly commit + push to main?

Technically possible (via permission allow), but strongly discouraged. git push is an "external side effect" that immediately affects production, CI, and other team members. Pushing should be the final human-approved step. The recommended workflow: let the developer stop at commit (local trace created but not pushed); humans perform PR review and push. When /dev completes a group, have it notify you via Telegram ("Group N Passed, Ready to Push")—you take a quick look and manually push. This one-second human gate prevents 99% of accidents. Giving agents push permission is like handing the steering wheel to the passenger.

Q16. Will too many "permission deny" rules cause agents to get stuck?

They won't "hang"; they will report an error and request correction. When a deny rule is hit, Claude Code displays the reason for refusal. Main Claude receiving the error will say, "I tried X but was blocked; I suggest using Y instead." The downside is the interruption of your flow. Best strategy: (1) Start with a conservative deny list and observe blocked records over time; (2) Add exceptions for valid but blocked operations; (3) Excessive deny rules lead to a poor autonomy experience. Our doc2video deny list was fine-tuned through iteration—we had a few false positives initially, but it became smooth after adding exceptions. Anti-pattern: Disabling a whole rule just because it blocked something once—that's like turning off the safety net.

Q17. What happens if a Stop hook fails? Does it affect Claude Code?

It does not affect the main process. If a hook exits with a non-zero code, Claude Code displays a warning but continues working—your turn completes as intended, you just might miss the notification. In the worst case, you miss a Telegram push and have to check the transcript manually. This is by design: Anthropic designed hooks as "enhancements rather than blockers"—a hook crash shouldn't cause a user to lose work. Anti-pattern: Putting critical process logic in a hook ("next step depends on hook success")—a network glitch could then stall your deployment. Hooks should only perform "auxiliary actions" (notifications, logging, monitoring) and never be in the critical path.

About Extension (Q18-Q20)

Q18. Can I share settings.json across different git repositories?

Not at the project level (as it lives in .claude/ within the project). Sharing methods:

  1. User Level: Put general parts (default model, theme) in ~/.claude/settings.json; all projects inherit from here.
  2. Configuration Templates: Maintain a claude-template/ repository and cp -r template/.claude/ ./.claude/ for new projects.
  3. Skills / Plugins: Package common agents/hooks as skills or plugins for distribution.
  4. Dotfile Sync: Use tools like chezmoi or yadm to synchronize ~/.claude/.

Key Principle: Share the general, projectize the specific. A fully cross-repository settings.json doesn't exist by design, to prevent a single point of failure from affecting all projects.

Q19. Will multiple agents modifying the same file cause conflicts?

Our design does not allow concurrent dispatch for the same group—each group is processed serially. CLAUDE.md explicitly states that dispatch is per group and groups are serial. Cross-group parallelism is theoretically possible but requires:

  • Files modified by the two groups to be entirely non-overlapping (inferred from tasks.md).
  • Orchestrator (dev.md) to explicitly enable a parallel mode.
  • Robust rollback mechanisms for failures.

Parallelism is disabled for the MVP—multi-agent systems are already complex enough; adding concurrency is a debugging nightmare. Wait until there's a genuine need (e.g., a 30+ group project on a tight deadline). For doc2video, 13 groups running serially finished the MVP in half a day, so parallelism wasn't necessary.

Q20. Will this workflow be built into Claude Code eventually? Is my current effort wasted?

Parts of it will likely be internalized (Claude Code is evolving towards "native multi-agent" with infrastructures like EnterWorktree and subagent in the 2.1.x changelogs), but the core methodology will not become obsolete:

  1. The orchestration layer might be simplified, and custom /dev commands might become standard features.
  2. Spec-driven development is tool-neutral—OpenSpec works with any agent system, and the methodology is portable.
  3. Principles of role division, escalation tiers, and file-based state machines are software engineering patterns that are cross-tool universal.

Conclusion: Today's tool configuration might need a rewrite in a year, but the methodology in your mind will still apply in five. The investment pays off.


📎 Appendix B: Glossary

Organized by category, each entry provides a 1-2 sentence definition and references the chapter for details.

OpenSpec Terminology

Term Definition Details
Capability A capability domain, corresponding to one specs/<name>/spec.md file Ch 7, 12
Spec The contract document (specs/) currently committed by the system, accumulated and merged from all archived changes Ch 7
Change An ongoing modification, including the 4-artifact set: proposal/design/specs/tasks Ch 7, 10
Archive Storage for completed changes, preserving full history Ch 7
Proposal The "Why + What" of a change, 1-2 pages Ch 10
Design The "How" of a change, recording technical decisions and trade-offs Ch 11
Requirement A system promise, using SHALL/MUST phrasing Ch 12
Scenario A specific behavior for a Requirement, in WHEN/THEN format, marked with four # Ch 12
Delta Spec changes within a change (ADDED/MODIFIED/REMOVED/RENAMED) Ch 12
Task A - [ ] item in tasks.md that can be checked off by developers Ch 13
Group A task group defined by ## N. Name in tasks.md; the smallest unit of dispatch Ch 13
Active change The change currently being implemented (the one where archive is marked false) Ch 6, 21

Claude Code Terminology

Term Definition Details
main Claude The primary Claude Code process you interact with, acting as the dispatcher Ch 3, 14
subagent A dispatchable agent defined via .claude/agents/<name>.md Ch 14
Skill A standardized workflow callable by main Claude (e.g., /opsx:propose) Ch 6, 14
Slash command A prompt template triggered by user input starting with / Ch 21
CLAUDE.md The "Project Constitution" in the root directory, read automatically by main Claude at startup Ch 19
Hook An external script triggered by Claude Code before or after tool calls Ch 25, 26
PreToolUse hook A hook triggered before tool calls, capable of interception Ch 25
Stop hook Triggered after main Claude completes a block of output, often used for notifications Ch 26
Permission mode default / acceptEdits / bypassPermissions Ch 24
$CLAUDE_PROJECT_DIR Environment variable for the project root, commonly used in hooks Ch 25
MCP Model Context Protocol, for connecting to external services like Telegram or Gmail Ch 26
Frontmatter YAML metadata at the top of agent or command files Ch 16, 21

Multi-Role and Orchestration Terminology

Term Definition Details
Dispatcher The orchestrator role, the identity of main Claude in a multi-agent system Ch 19
Doer The executor, opposite to the dispatcher—main Claude must avoid being a doer Ch 19
Briefing Self-contained instructions attached to a subagent during dispatch Ch 14, 21
Round The number of iterations for the same group/agent (e.g., reviewerRound, testFailRound) Ch 18
Threshold The limit for infinite loops (e.g., upgrade after 3 rejections, architect after 5 test failures) Ch 18
Escalation Automatic upgrade to a more powerful agent (Sonnet → Opus) when stuck Ch 18
Tier Escalation levels: Developer / Developer-Deep / Architect Ch 18
Marker Fixed prefixes in output (⚠️ / ✓ / ✗) for hooks to identify significant events Ch 21, 26
Path A The path where a deep agent identifies spec flaws without writing code Ch 18
Path B The path where a deep agent reworks using a structurally different solution Ch 18
STUCK.md A diagnostic report by the architect containing root cause + 3 options Ch 18
State machine File-based state machine (PENDING/DEVELOPING/TESTING/REVIEWING/...) Ch 20
Crash-safe State stored in files, enabling recovery after process crashes Ch 20

The 4 Roles

Role Responsibility Model Details
developer Implements a specific group from tasks.md Sonnet Ch 15, 16
developer-deep Upgraded version of the developer (used when stuck) Opus Ch 18
tester Writes unit and functional tests Sonnet Ch 15
tester-deep Upgraded version of the tester Opus Ch 18
e2e-tester Black-box end-to-end testing (forbidden from reading src/) Sonnet Ch 15
reviewer Reviews against spec/design Opus Ch 15, 17
architect Final diagnostic fallback (does not write code) Opus Ch 18

Security and Permission Terminology

Term Definition Details
Allow rule A rule where matches are automatically approved Ch 24
Deny rule A rule where matches are rejected (takes precedence over allow) Ch 24
Bypass mode bypassPermissions—skips prompts, relying on deny rules and hooks for safety Ch 24
Path-aware check Path-sensitive validation (e.g., allow rm within project, block outside)—requires hooks Ch 25
Bind mount Docker mounting host directories into containers for file sharing and process isolation Ch 29
Sandbox tier Levels: L1 (worktree+venv) / L2 (Docker) / L3 (VM) Ch 29

Project-Specific (doc2video) Terminology

Term Definition
doc2video Case project—Compiling Markdown tutorial documents into videos
edge-tts Microsoft Edge browser's TTS service (free), generates mp3 voiceovers
avfoundation macOS built-in audio-video framework, used by ffmpeg for screen recording
tmux Terminal multiplexer controllable by scripts, used for real command execution
:::manual Markdown extension for doc2video, marking purely narrative steps (no terminal commands)
TTS Text-to-Speech
Headless browser Browser without a GUI, used for automated testing
Marker (Command Completion) Unique strings wrapping tmux send-keys for reliable command completion detection

📎 Appendix C: Comparison with Other Tools

Tooling Landscape

flowchart LR
    subgraph IDE["In-IDE Real-time Assistance"]
        Cursor["Cursor"]
        Copilot["GitHub Copilot"]
        Continue["Continue"]
    end

    subgraph CLI["CLI Editor Style"]
        Aider["Aider"]
        Cline["Cline"]
    end

    subgraph Agent["Autonomous Agents"]
        Devin["Devin"]
        Sweep["Sweep AI"]
        ThisStack["This Stack
(Claude Code+OpenSpec+Multi-Agent)"] end subgraph Chat["Pure Chat"] ClaudeWeb["Claude.ai"] ChatGPT["ChatGPT"] end style ThisStack fill:#c8e6c9

Comparison Matrix

Dimension Pure Claude.ai Cursor Copilot Aider Devin Cline This Stack
Intervention Dialogue In-IDE Real-time In-IDE Completion CLI Editor Full Autonomy CLI Autonomy CLI Orchestration + Multi-Agent
Context Mgmt Manual File Aware Current File Git Aware Task Mgmt Project Level OpenSpec Spec
Multi-Agent Partial (Internal) Single Agent ✅ 4-7 Roles
Spec Support ✅ OpenSpec
Autonomy Low Medium Low Medium Extremely High High High (Adjustable)
Perms/Sandbox n/a IDE Restricted IDE Restricted Git Ops Internal Sandbox Config Configurable Hooks
Recoverable Limited Git Internal Limited ✅ File State Machine
Cost Monthly Monthly Monthly Per Token $$$ / Task Per Token Per Token
Learning Curve Extremely Low Low Extremely Low Medium Low (Black box) Medium High
Best For Quick Consult Daily Coding Completion Existing Git Projects Full Task Delegation CLI Users Medium-Long Term Projects

Detailed Comparison

vs Cursor

Cursor:    In-IDE "Companion"—real-time suggestions as you write.
This Stack: "Team"—you design goals, it finishes the job.

Not conflicting; use both. Cursor handles "how to write this line," and this stack handles "the feature to be built this week."

vs GitHub Copilot

Copilot:   Completion level (the next line at the cursor).
This Stack: Project level (spec → implementation → test → review).

Copilot doesn't see project intent, only local context. This stack gives the full "map" via OpenSpec to the agent.

vs Aider

Aider:    Single agent, CLI operating on git, one task at a time.
This Stack: Multi-agent, orchestrator, long-running pipeline.

Aider is suited for "I want to fix this bug in this PR." This stack is for "I want to build a new feature + long-term evolution."

vs Cline / Roo (Claude client within VS Code)

Cline:    IDE-ified Claude Code, single agent + permissions.
This Stack: Built on Claude Code, adds OpenSpec + multi-agent + escalation.

If you use Cline, you can transplant 70% of these ideas—Cline also supports hooks and custom prompts. The difference is that the OpenSpec toolchain requires Claude Code's skill system for full completion.

vs Devin (autonomous agent)

Devin:    Black box—you give a task, it decomposes, executes, and reports.
This Stack: White box—every step, decision, and state is visible and editable in files.

Devin is like outsourcing to a sealed engineer. This stack is like building your own controllable team. When Devin goes out of control, you can only watch the logs; when this stack goes out of control, you can modify the spec, agent, or hook.

vs Sweep AI (GitHub auto PR)

Sweep:    Issue → Black-box PR generation.
This Stack: Requirement → Spec → Multi-agent implementation → Review → PR.

Sweep is for "help me fix this issue," a one-off. This stack is for "I want to run the whole project this way long-term."

vs Pure Claude Code (no OpenSpec, no multi-agent)

Pure Claude Code:   You talk to a single Claude; it does everything.
This Stack:         Adds OpenSpec (Specification Layer) + Multi-Agent (Governance Layer).

This is the core comparison of the tutorial. Pure Claude Code hits the three pain points of Chapter 1 in medium-sized projects—requirement drift, self-certification traps, and permissive permissions. This stack adds three layers of infrastructure to pure Claude Code.

Selection Guide

flowchart TD
    Q1{How long is the project?}
    Q1 -->|1 Hour| Chat["Claude.ai Dialogue"]
    Q1 -->|1 Day to 1 Week| IDE["Cursor / Copilot"]
    Q1 -->|Over 1 Month| Q2{Long-term Evolution?}
    Q2 -->|No| Aider["Aider"]
    Q2 -->|Yes| Q3{Team Collaboration Needed?}
    Q3 -->|No| Cline["Cline / Claude Code Single Agent"]
    Q3 -->|Yes| ThisStack["✓ This Stack"]

    style ThisStack fill:#c8e6c9

Stacking Rather Than Choosing

In an actual workflow, it's likely like this:

Daily coding (at the cursor) → Cursor
Daily commits / small fixes → Aider occasionally
Feature level (a new feature) → This stack (OpenSpec + Multi-Agent)
Quick consult / learning → Claude.ai dialogue

No conflict—they solve problems at different abstraction layers.


🎯 Tutorial Complete

If you've successfully run doc2video, congratulations—you've mastered the most advanced AI collaborative development methodology of 2026.

flowchart LR
    Read["Read 30 Chapters"] --> Try["Run doc2video"]
    Try --> Adapt["Adapt to Your Project"]
    Adapt --> Teach["Teach Colleagues"]
    Teach --> Master["Truly Master It"]

    style Master fill:#c8e6c9

Feedback and Contributions

  • Found errors / Suggest improvements: Open an issue in the OpenSpec project repository.
  • Ideas / Practice sharing: PRs to join the "Practice Cases" appendix are welcome.

Acknowledgments

This tutorial is derived from the actual experience of building the doc2video project from 0 to 1. All mermaid diagrams, code snippets, and decision matrices come from the real development process—not fictional textbook cases.

Share this document with your colleagues.