lesson-16

20 MIN READ | UPDATED: 2026-05-07

Should an AI Content Agency adopt a flat discussion model or a top-down pyramid reporting structure?

Welcome back to the LangGraph Multi-Agent Masterclass. I am your instructor.

Over the past 15 episodes, we built our "AI Content Agency" from scratch. We now have four highly capable team members: the Planner, the Researcher, the Writer, and the Editor. Previously, we had them collaborate through a simple Linear Flow, passing the baton to churn out viral articles one after another.

But let's be real—when is real-world business ever that ideal? Yesterday, the agency received a massive order: "Please write an in-depth industry analysis on Apple's newly released Vision Pro, output a lifestyle promotional post for Xiaohongshu/Instagram, and provide a matching Twitter thread."

If we stick to our old "relay race" model, the Researcher would gather the data and toss it to the Writer. The Writer would instantly freeze: "Wait, do I write the deep dive first, or the social media post?" The entire workflow would grind to a halt.

When business complexity scales exponentially, a single-line flow is doomed to fail. Refactoring our multi-agent architecture is imminent. Today, we are going to deeply explore and implement the two core schools of multi-agent architecture: Flat and Hierarchical. We will introduce a true "Manager" mechanism to our agency, transforming your Agent team from a "makeshift crew" into a "well-oiled machine."

Grab your notebooks and focus, because things are about to get brain-bending!


🎯 Learning Objectives for This Episode

  1. Cognitive Upgrade (The Philosophy): Deeply understand the underlying logic and use cases of Flat (peer-to-peer network) vs. Hierarchical (pyramid) architectures.
  2. Architecture Refactoring (The Technique): Master the standard design pattern of introducing a Supervisor (Router) node in LangGraph.
  3. Practical Implementation (The Application): Use Python + LangGraph to refactor our Content Agency, implementing a complex workflow where the Planner coordinates and delegates tasks to specialized roles.
  4. State Management (The Deep Dive): Solve the Context Bloat problem caused by frequent interactions among multiple Agents.

📖 Theory Breakdown: Makeshift Crew vs. Modern Enterprise

In a Multi-Agent System (MAS), the collaboration topology between Agents dictates the system's ceiling. Let's look at the two most classic architectures.

1. Flat Architecture (Peer-to-Peer)

Imagine a startup: the boss, the product manager, and the developers all sit at the same table. When a problem arises, everyone chimes in and discusses it in a group chat.

  • Mechanism: Agents can communicate directly with each other or speak freely via a shared Blackboard.
  • Pros: Extremely low communication overhead and high flexibility. Perfect for brainstorming, role-playing games, debates, and scenarios requiring divergent thinking.
  • Cons: Loss of control. When tasks get complex, it's incredibly easy to fall into an infinite loop of "passing the buck" (e.g., the Writer thinks the data is insufficient and kicks it back to the Researcher; the Researcher thinks the Writer just doesn't understand tech and kicks it back again).

2. Hierarchical Architecture (Supervisor-led)

As a company grows, a corporate hierarchy becomes necessary. We need a foreman—a Supervisor.

  • Mechanism: Establish a centralized Supervisor (in our project, the Planner). All Worker Agents (Researcher, Writer, Editor) cannot communicate directly with each other. Once a Worker finishes a task, they must report the results back to the Supervisor. The Supervisor then decides whether to hand the next step to another Worker or to finish the task.
  • Pros: Highly orderly, logically rigorous, and extremely suitable for goal-oriented, complex task decomposition.
  • Cons: The Supervisor becomes a performance bottleneck. If the manager is "dumb" (poorly written Prompts or weak LLM capabilities), the entire team is paralyzed.

For our AI Content Agency, facing multi-channel, multi-format content generation demands, the Hierarchical architecture is the only cure.

The diagram below visually demonstrates the architectural shift we are refactoring today:

graph TD
    subgraph "❌ Past Pain Points: Flat/Linear Architecture (Makeshift Crew)"
        R1[Researcher] -->|Data/Info| W1[Writer]
        W1 -->|First Draft| E1[Editor]
        E1 -->|Return for Revision?| W1
        style R1 fill:#ffcccc,stroke:#333,stroke-width:2px
        style W1 fill:#ffcccc,stroke:#333,stroke-width:2px
        style E1 fill:#ffcccc,stroke:#333,stroke-width:2px
    end

    subgraph "✅ Today's Refactoring: Hierarchical Architecture (Well-Oiled Machine)"
        User((Client Request)) --> S{Planner\n(Supervisor)}
        
        S -->|1. Assign Research Task| R2[Researcher]
        R2 -.->|2. Submit Report| S
        
        S -->|3. Assign Writing Task| W2[Writer]
        W2 -.->|4. Submit Draft| S
        
        S -->|5. Assign Editing Task| E2[Editor]
        E2 -.->|6. Submit Final Draft| S
        
        S -->|7. All Tasks Completed| FINISH(((Final Output)))
        
        style S fill:#ff9900,stroke:#333,stroke-width:4px,color:#fff
        style R2 fill:#cce5ff,stroke:#333,stroke-width:2px
        style W2 fill:#cce5ff,stroke:#333,stroke-width:2px
        style E2 fill:#cce5ff,stroke:#333,stroke-width:2px
        style FINISH fill:#ccffcc,stroke:#333,stroke-width:2px
    end

Make sense? In the new architecture, the Planner becomes the Routing Hub of the entire Graph. It doesn't do the grunt work; it only does two things: Think and Delegate.


💻 Practical Code Walkthrough

Enough theory—let's dive straight into the code. We will use LangGraph's StateGraph to build this hierarchical network. To ensure stable routing decisions, we will arm our Supervisor with OpenAI's Structured Output feature.

1. Environment Setup and State Definition

First, we need to define the agency's "shared ledger"—the AgencyState.

import operator
from typing import Annotated, Sequence, TypedDict, List
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from langgraph.graph import StateGraph, START, END

# 1. Define the system state (The State of the Agency)
# Inherit from TypedDict, use operator.add to append messages
class AgencyState(TypedDict):
    # Record all historical conversations and work artifacts
    messages: Annotated[Sequence[BaseMessage], operator.add]
    # Record who should take over the next task
    next_agent: str 

2. Building the Core Brain: Supervisor (Planner)

This is the soul of today's code. We will force the LLM to output a specific JSON structure, telling LangGraph where to go next.

# 2. Define the Supervisor's routing structure (Structured Output)
# Our agency currently has three workers
MEMBERS = ["Researcher", "Writer", "Editor"]

class RouterDecision(BaseModel):
    """Supervisor decides who should act next."""
    # Force the LLM to choose from these options, or output FINISH
    next_agent: str = Field(
        description="The next agent to act. Choose from 'Researcher', 'Writer', 'Editor', or 'FINISH' if the whole project is done."
    )
    reasoning: str = Field(
        description="Brief explanation of why this agent was chosen."
    )

# Initialize a smart brain (gpt-4o recommended, the supervisor shouldn't be dumb)
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Bind the Pydantic model using with_structured_output
supervisor_chain = llm.with_structured_output(RouterDecision)

# Write the logic for the Supervisor node
def supervisor_node(state: AgencyState):
    """
    Supervisor Node: Analyzes the current state and routes to the next agent.
    """
    system_prompt = (
        "You are the Chief Planner of an elite AI Content Agency.\n"
        "Your team members are: {members}.\n"
        "Your job is to read the conversation/work history and decide who should act next.\n"
        "Rule 1: Always start with 'Researcher' to gather facts.\n"
        "Rule 2: Pass to 'Writer' to draft the content based on research.\n"
        "Rule 3: Pass to 'Editor' to review and refine the draft.\n"
        "Rule 4: If the Editor has approved the final content and all requirements are met, output 'FINISH'.\n"
        "DO NOT do the work yourself. Just route it."
    ).format(members=", ".join(MEMBERS))
    
    messages = [{"role": "system", "content": system_prompt}] + state["messages"]
    
    # Call the LLM to make a decision
    print("🧠 [Planner] is thinking...")
    decision = supervisor_chain.invoke(messages)
    print(f"🎯 [Planner] Decision: Next up is -> {decision.next_agent}. Reason: {decision.reasoning}")
    
    # Key point: Return the updated state, especially the next_agent field
    return {"next_agent": decision.next_agent}

3. Defining the Worker Nodes

For clarity in this demonstration, we use simple Prompts to simulate the work of these three experts. In real-world production code, you would equip them with their respective Tools (e.g., giving the Researcher a Tavily search tool, and the Writer a Markdown formatting tool).

# 3. Define Worker Nodes
# Helper function: Wrap the Worker's output into an AIMessage and identify the role
def worker_node_factory(role_name: str, system_instruction: str):
    def node(state: AgencyState):
        print(f"🛠️ [{role_name}] is working on the task...")
        worker_llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)
        messages = [{"role": "system", "content": system_instruction}] + state["messages"]
        response = worker_llm.invoke(messages)
        
        # Must add the name to the message so the Supervisor knows who did it
        final_message = AIMessage(
            content=response.content, 
            name=role_name # Mark the source of the message
        )
        return {"messages": [final_message]}
    return node

# Instantiate the three workers
researcher_node = worker_node_factory(
    "Researcher", 
    "You are a meticulous Researcher. Read the request, provide detailed bullet points of facts and data. Do not write the final article."
)

writer_node = worker_node_factory(
    "Writer", 
    "You are an expert Writer. Take the research provided by the Researcher and draft a compelling piece of content based on the user's request."
)

editor_node = worker_node_factory(
    "Editor", 
    "You are a strict Editor. Review the Writer's draft. Fix any typos, improve the tone, and output the FINAL polished version. Explicitly state 'FINAL VERSION APPROVED' at the end."
)

4. Assembling the Graph: Building the Pyramid

Now, we need to connect the Supervisor and the Workers using LangGraph Edges. This is the core manifestation of the Hierarchical architecture.

# 4. Build the LangGraph
workflow = StateGraph(AgencyState)

# Add all nodes
workflow.add_node("Supervisor", supervisor_node)
workflow.add_node("Researcher", researcher_node)
workflow.add_node("Writer", writer_node)
workflow.add_node("Editor", editor_node)

# After all Workers finish their work, they must unconditionally report back to the Supervisor
for member in MEMBERS:
    workflow.add_edge(member, "Supervisor")

# Define conditional routing logic
def router(state: AgencyState):
    # Determine the next step based on the next_agent set by the Supervisor
    next_node = state["next_agent"]
    if next_node == "FINISH":
        return END
    return next_node

# The Supervisor's next step is a Conditional Edge
workflow.add_conditional_edges(
    "Supervisor", # Starting point
    router,       # Routing function
    {
        "Researcher": "Researcher",
        "Writer": "Writer",
        "Editor": "Editor",
        END: END
    }
)

# Set the graph entry point: When a task comes in, go to the Planner (Supervisor) first
workflow.add_edge(START, "Supervisor")

# Compile the graph
agency_app = workflow.compile()

5. Running the Demo

Let's see how this "well-oiled machine" handles a complex task.

# 5. Run the test
if __name__ == "__main__":
    task_prompt = "Write a short, engaging tweet about the latest breakthrough in Quantum Computing. Make it accessible to the public."
    
    print(f"🚀 User Request: {task_prompt}\n")
    print("-" * 50)
    
    initial_state = {
        "messages": [HumanMessage(content=task_prompt)],
        "next_agent": ""
    }
    
    # Set recursion limit to prevent infinite loops
    config = {"recursion_limit": 15}
    
    for chunk in agency_app.stream(initial_state, config=config):
        # Print the name of the currently completed node
        if "__end__" not in chunk:
            node_name = list(chunk.keys())[0]
            print(f"✅ [{node_name}] finished its turn.\n")
            print("-" * 50)
            
    print("🎉 Project Completed Successfully!")

Expected Terminal Output Simulation:

🚀 User Request: Write a short, engaging tweet about the latest breakthrough in Quantum Computing...

🧠 [Planner] is thinking... 🎯 [Planner] Decision: Next up is -> Researcher. Reason: Need to gather the latest facts on quantum computing breakthroughs first. ✅ [Supervisor] finished its turn.

🛠️ [Researcher] is working on the task... ✅ [Researcher] finished its turn.

🧠 [Planner] is thinking... 🎯 [Planner] Decision: Next up is -> Writer. Reason: Research is complete, now we need to draft the tweet. ✅ [Supervisor] finished its turn.

... (Continues until FINISH) 🎉 Project Completed Successfully!


💣 Pitfalls & Survival Guide (An Advanced Developer's Debugging Guide)

As a veteran with 10 years of experience, I must warn you: running the demo above is just the first step. In a real production environment, the Hierarchical architecture has a few fatal pitfalls. Step into them, and you're in for a world of pain.

Pitfall 1: The Supervisor Infinite Loop

Symptom: The Editor thinks the article is bad and kicks it back to the Writer; the Writer tweaks it and sends it back to the Editor, who is still unhappy. They keep passing the buck back and forth through the Supervisor, draining your API Token balance to zero in seconds. How to Avoid:

  1. Hard Stop: You must set a recursion_limit (like the 15 in our code) when calling workflow.compile().
  2. Prompt Constraints: Explicitly state in the Supervisor's Prompt: "A maximum of 2 revisions is allowed. If exceeded, force output FINISH."
  3. State Injection: Add a revision_count: int field to the AgencyState. Increment it by 1 every time a draft is rejected. Once it hits the threshold, route directly to END.

Pitfall 2: Context Bloat

Symptom: Because all Workers must report their results to the Supervisor, the messages list grows longer and longer. By the 10th round of interaction, every LLM call carries tens of thousands of tokens of useless chatter. How to Avoid: Do not blindly use operator.add to accumulate all messages. Introduce a State Summarization mechanism. Alternatively, separate the State into a scratchpad (for drafting, cleared periodically) and final_artifacts (the final deliverables, keeping only the latest version).

Pitfall 3: Using a Sledgehammer to Crack a Nut (Over-engineering)

Symptom: The client simply asks, "Translate this sentence into English," but you trigger the entire Planner -> Researcher -> Writer -> Editor workflow, taking 30 seconds and costing $0.10. How to Avoid: There is no absolutely "good" or "bad" architecture, only what fits. If 80% of your business consists of simple tasks, add a Triage Node in front of the Supervisor. For simple tasks, call a lightweight model for a single-shot output; for complex tasks, route them into the hierarchical network.


📝 Episode Summary

Today, we completed an epic refactoring.

We explored the flexibility and chaos of the Flat architecture, as well as the rigor and control of the Hierarchical architecture. By leveraging LangGraph's add_conditional_edges and the LLM's structured output capabilities, we successfully built a Supervisor (Planner) capable of independent thinking and task delegation.

Your AI Content Agency is no longer a scattered group of freelancers working in silos, but a modernized AI production assembly line commanded by a central brain, with everyone performing their specialized roles.

A question to ponder: In today's code, the Supervisor can only dispatch tasks to one Agent at a time. If the client requests, "Search Google and Baidu simultaneously," can we have two Researchers work in Parallel, and then aggregate their findings for the Writer?

This involves LangGraph's advanced Fan-out / Fan-in mechanisms. Don't worry, we'll cover that in our next episode: Episode 17 | Parallel Processing and Map-Reduce: Giving Your Team the Power of Cloning!

Class dismissed! Make sure you run the code—don't just read it, practice it!