Chapter 2: Overview of the doc2video Case Project
Learning Objectives
Understand the project to be built throughout this tutorial. This way, you can map every abstract concept to a concrete scenario later on.
Project Definition in One Sentence
Compile a Markdown tutorial document into a video with voiceover, subtitles, and real terminal operations.
Input and Output
Input: A Markdown document (e.g., "Claude Code Installation Tutorial")
## Chapter 2: Step One: Install Node
We'll use nvm to install Node 22. Open your terminal and run:
```bash
curl -o- https://raw.../nvm/install.sh | bash
```
## Chapter 2: Step Two: Install Claude Code
Once Node is installed...
```bash
npm install -g @anthropic-ai/claude-code
```
Output:
dist/install-claude/
├── video.mp4 5-minute video, with real operations
├── video.srt Subtitles
└── report.md Execution report
In the video, you will see:
- Left half-screen: Rendered Markdown document, with the current step highlighted
- Right half-screen: Real iTerm terminal, commands typed out character by character and actually executing
- Chinese voiceover: Narrating the explanation for each step
System Components
flowchart TB
Input["tutorial.md"] --> Parser["Markdown Parser"]
Parser --> Steps["Step List
narration + commands"]
Steps --> TTS["edge-tts
generates mp3"]
Steps --> Term["tmux Terminal
real command execution"]
Steps --> Doc["Browser Panel
WebSocket highlighting"]
TTS --> Recorder
Term --> Recorder
Doc --> Recorder
Recorder["ffmpeg
full-screen recording"] --> Composer["Video Composer"]
Composer --> Output["video.mp4 + .srt + report.md"]
style Input fill:#e8f5e9
style Output fill:#fff3e0Key Technical Decisions (The rationale behind these decisions will be detailed in later chapters)
| Decision | Our Choice | Why |
|---|---|---|
| Real Execution vs. Simulation | Real Run | Even errors have teaching value |
| Sandbox vs. Real Desktop | Real Desktop | macOS screen recording requires avfoundation |
| Voiceover Engine | edge-tts | Free, good Chinese quality |
| Document Highlighting | Local HTTP + WebSocket | Avoids introducing heavy React framework |
| Synchronization Strategy | Narrate then Type | Transforms reactive scheduling into linear |
Why This Project is a Good "Teaching Sample"
✅ Not too big, not too small: MVP can be completed in about 3 weeks
✅ Multi-language / Multi-tech stack: Python + Web + System commands
✅ Covers the testing pyramid: Unit / Functional / E2E
✅ Real-world complexity: State machines, concurrency coordination, external dependencies
❌ Not ideal for: UI-heavy projects (this project has almost no UI)
What You Can Do Now
- Articulate what doc2video does
- Identify what type of project it is (developer tool / automation pipeline)
- Clearly assess how similar your own project is to it
The next chapter will establish a mental model, organizing all concepts of this tutorial into a single diagram.