Chapter 02 | Example Project doc2video: The Big Picture

4 MIN READ | UPDATED: 2026-05-15

Chapter 2: Overview of the doc2video Case Project

Learning Objectives

Understand the project to be built throughout this tutorial. This way, you can map every abstract concept to a concrete scenario later on.

Project Definition in One Sentence

Compile a Markdown tutorial document into a video with voiceover, subtitles, and real terminal operations.

Input and Output

Input: A Markdown document (e.g., "Claude Code Installation Tutorial")

## Chapter 2: Step One: Install Node
We'll use nvm to install Node 22. Open your terminal and run:
```bash
curl -o- https://raw.../nvm/install.sh | bash
```

## Chapter 2: Step Two: Install Claude Code
Once Node is installed...
```bash
npm install -g @anthropic-ai/claude-code
```

Output:

dist/install-claude/
├── video.mp4         5-minute video, with real operations
├── video.srt         Subtitles
└── report.md         Execution report

In the video, you will see:

  • Left half-screen: Rendered Markdown document, with the current step highlighted
  • Right half-screen: Real iTerm terminal, commands typed out character by character and actually executing
  • Chinese voiceover: Narrating the explanation for each step

System Components

flowchart TB
    Input["tutorial.md"] --> Parser["Markdown Parser"]
    Parser --> Steps["Step List
narration + commands"] Steps --> TTS["edge-tts
generates mp3"] Steps --> Term["tmux Terminal
real command execution"] Steps --> Doc["Browser Panel
WebSocket highlighting"] TTS --> Recorder Term --> Recorder Doc --> Recorder Recorder["ffmpeg
full-screen recording"] --> Composer["Video Composer"] Composer --> Output["video.mp4 + .srt + report.md"] style Input fill:#e8f5e9 style Output fill:#fff3e0

Key Technical Decisions (The rationale behind these decisions will be detailed in later chapters)

Decision Our Choice Why
Real Execution vs. Simulation Real Run Even errors have teaching value
Sandbox vs. Real Desktop Real Desktop macOS screen recording requires avfoundation
Voiceover Engine edge-tts Free, good Chinese quality
Document Highlighting Local HTTP + WebSocket Avoids introducing heavy React framework
Synchronization Strategy Narrate then Type Transforms reactive scheduling into linear

Why This Project is a Good "Teaching Sample"

✅ Not too big, not too small: MVP can be completed in about 3 weeks
✅ Multi-language / Multi-tech stack: Python + Web + System commands
✅ Covers the testing pyramid: Unit / Functional / E2E
✅ Real-world complexity: State machines, concurrency coordination, external dependencies
❌ Not ideal for: UI-heavy projects (this project has almost no UI)

What You Can Do Now

  • Articulate what doc2video does
  • Identify what type of project it is (developer tool / automation pipeline)
  • Clearly assess how similar your own project is to it

The next chapter will establish a mental model, organizing all concepts of this tutorial into a single diagram.