lesson-23

17 MIN READ | UPDATED: 2026-05-07

How do you achieve a typewriter effect on the frontend while the Graph is still running to generate a 10,000-word article?

Welcome back, AI Architects, to our LangGraph Multi-Agent Masterclass. It's your old friend here.

In our last episode, our "AI Content Agency" started taking shape. The Planner strategizes, the Researcher searches relentlessly, the Writer drafts furiously, and the Editor strictly gatekeeps. I'm sure you all felt a rush of satisfaction the moment that entire workflow ran successfully.

However, yesterday a student complained in our group: "Teacher, my Graph runs fine, but when the Writer node starts drafting that 10,000-word deep-dive industry report, my frontend UI freezes for a solid 40 seconds! My boss thought the system crashed and almost had my head on a spike."

This isn't just an experience issue; this is a severe UX (User Experience) disaster.

In traditional monolithic LLM calls, we've long been accustomed to using streaming=True to achieve that typewriter effect. But in a complex state machine like LangGraph, which consists of multiple Agent nodes, the default streaming output (stream_mode="values" or "updates") operates at the node level. This means it waits until the Writer has squeezed out all 10,000 words and the node finishes executing before tossing the final state back to you.

Can we tolerate this? Absolutely not! Today, we are going to peel back the underlying layers of LangGraph, "hijack" the LLM's token stream from inside the node, and push it directly to the frontend! We will be using today's absolute star: stream_mode="messages".


🎯 Learning Objectives

By the end of this lesson, you will have mastered the following skills:

  1. Breaking Node Barriers: Deeply understand the fundamental difference between Graph-level state streams and LLM-level token streams.
  2. Mastering stream_mode="messages": Learn how to intercept and parse Message Chunks and Metadata emitted by LangGraph's underlying engine.
  3. Precise Routing & Distribution: In a multi-agent collaboration, accurately target and extract only the Writer node's token stream, filtering out the internal thought processes of the Planner and Researcher.
  4. Refactoring the Agency Output Layer: Endow our AI Content Agency with silky-smooth, real-time "typewriter" output capabilities.

📖 Under the Hood

Before we write any code, we need to understand LangGraph's streaming philosophy. Take notes, as this is a highly tested interview topic!

LangGraph provides three main stream_mode options:

  • "values": Every time a node updates, it throws the complete global State back to you.
  • "updates": When a node finishes executing, it throws only the part of the state updated by that node back to you. (This is what we used most in the previous 22 episodes).
  • "messages": Fine-grained listening mode. It no longer waits for the node to finish. Instead, it directly listens to the ChatModel inside the node. The moment the LLM spits out a Token (AIMessageChunk), it immediately throws it out along with the Metadata indicating which node it currently belongs to.

To use an analogy: "updates" is like eating at a restaurant. You have to wait for the chef (Writer) to cook the entire "10,000-word" dish and bring it to your table before you can see your food. "messages" is like pulling up a stool and sitting right next to the chef. Every time he chops an onion (Token), you see it crystal clear.

Let's look at a Mermaid diagram to see the workflow behind this:

sequenceDiagram
    participant User as Frontend User
    participant Graph as LangGraph (Agency)
    participant Planner as Planner Node
    participant Writer as Writer Node
    participant LLM as Underlying LLM

    User->>Graph: Submit request: "Write a 10k-word AI report"
    Graph->>Planner: Trigger planning
    Planner-->>Graph: Return outline (Node ends)
    
    Note over User, Graph: If using "updates" mode here,
the frontend only receives one outline update at this moment Graph->>Writer: Trigger writing (Highly time-consuming) Writer->>LLM: invoke(outline) rect rgb(230, 240, 255) Note over Graph, LLM: The magic moment of stream_mode="messages" LLM-->>Graph: Token 1 ("As") Graph-->>User: ⚡️ Real-time push Token 1 (with metadata: node="Writer") LLM-->>Graph: Token 2 (" the") Graph-->>User: ⚡️ Real-time push Token 2 (with metadata: node="Writer") LLM-->>Graph: Token 3 (" AI") Graph-->>User: ⚡️ Real-time push Token 3 (with metadata: node="Writer") end LLM-->>Writer: Generation complete Writer-->>Graph: Return full article state (Node ends) Graph-->>User: Final state update ends

Make sense? Under stream_mode="messages", the Graph becomes a transparent pipe. Every breath the underlying LLM takes is transmitted to the user in real-time.


💻 Hands-On Code Walkthrough

Enough talk, show me the code. We will build upon our "AI Content Agency" project, extracting the core logic from Planner -> Writer to demonstrate how to implement the typewriter effect.

Step 1: Build the Base Graph and Agents

(To allow you to copy and run this directly, I've condensed the State and Node definitions into a single script. Please ensure you have langgraph and langchain-openai installed.)

import os
from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END, START
from langgraph.graph.message import add_messages

# Assuming you have configured environment variables. If not, uncomment and fill in:
# os.environ["OPENAI_API_KEY"] = "sk-..."

# 1. Define our Agency state
class AgencyState(TypedDict):
    # Use add_messages to automatically merge conversation history
    messages: Annotated[list[BaseMessage], add_messages]
    outline: str
    final_article: str

# 2. Initialize the LLM (Note: No need to explicitly set streaming=True, LangGraph handles it under the hood)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

# 3. Define the Planner node (Responsible for the outline, no streaming to the user)
def planner_node(state: AgencyState):
    print("\n[System Log] Planner is thinking about the outline...")
    prompt = f"Please generate an article outline based on the user's request: {state['messages'][-1].content}"
    response = llm.invoke(prompt)
    return {"outline": response.content}

# 4. Define the Writer node (Responsible for the long-form article, our main target for streaming!)
def writer_node(state: AgencyState):
    print("\n[System Log] Writer is drafting the article based on the outline...")
    prompt = f"You are a senior writer. Please write a detailed article based on the following outline:\n{state['outline']}"
    # Just invoke directly. Streaming interception is handled externally by the Graph's stream method
    response = llm.invoke(prompt)
    return {"messages": [response], "final_article": response.content}

# 5. Assemble the Graph
workflow = StateGraph(AgencyState)
workflow.add_node("planner", planner_node)
workflow.add_node("writer", writer_node)

workflow.add_edge(START, "planner")
workflow.add_edge("planner", "writer")
workflow.add_edge("writer", END)

app = workflow.compile()

Step 2: The Magic Moment (Core Extraction Logic)

Now for the main event. We are going to run this Graph using stream_mode="messages" and extract ONLY the token stream from the Writer node.

Please read the comments in this code block carefully; this is the distilled debugging experience of a 10-year veteran:

def run_agency_with_streaming(user_input: str):
    print(f"👨‍💻 User Input: {user_input}\n" + "="*50)
    
    inputs = {"messages": [HumanMessage(content=user_input)]}
    
    # Pay attention! Enable stream_mode="messages"
    # This returns a generator, yielding a tuple each time: (message_chunk, metadata)
    stream = app.stream(inputs, stream_mode="messages")
    
    print("✍️ Frontend typewriter effect starts:\n")
    
    for chunk, metadata in stream:
        # metadata is a dictionary containing highly valuable info, like which node the current Token comes from
        # It looks like this: {'langgraph_step': 2, 'langgraph_node': 'writer', 'langgraph_triggering_edges': ['planner']}
        
        node_name = metadata.get("langgraph_node")
        
        # [Filtering Strategy 1]: We only care about the Writer node's output
        # Because the Planner also calls the LLM. Without this check, the frontend would print the outline as the article!
        if node_name == "writer":
            
            # [Filtering Strategy 2]: Ensure this is an AIMessageChunk (text chunk generated by the model)
            # Because the Graph also sends non-text control messages when entering or exiting a node
            if chunk.__class__.__name__ == "AIMessageChunk":
                
                # [Filtering Strategy 3]: Extract the actual text content
                # Sometimes the model might be making a Tool Call, in which case content is empty
                if chunk.content:
                    # Simulate frontend typewriter: print without newline and flush the buffer immediately
                    print(chunk.content, end="", flush=True)

    print("\n\n" + "="*50 + "\n✅ Article generation complete!")

# Run test
if __name__ == "__main__":
    run_agency_with_streaming("Please write a short article about the development trends of AI in 2024, divided into three paragraphs.")

When you run this code, you will see the console first print [System Log] Planner is thinking about the outline.... At this point, the UI is quiet (because we shielded the Planner's stream). Immediately after, it prints [System Log] Writer is drafting the article based on the outline..., and then, the article flows onto your screen word by word, exactly as if someone were frantically typing on a keyboard!

This is true production-grade UX.


Gotchas & Troubleshooting Guide (Senior-Level Debugging Experience)

As your mentor, I can't just teach you how to get a Demo running; I also need to tell you about the ghost stories you'll encounter in production environments (like when deploying that multi-million dollar project for a client).

🚨 Gotcha 1: Frontend Crashes Caused by Tool Calls

The Crash: When your Writer Agent has the ability to call external search tools (e.g., it realizes halfway through writing that it lacks data and calls Google Search), the chunk spat out by the LLM might have an empty chunk.content! What it's actually generating is chunk.tool_call_chunks. If your frontend blindly reads chunk.content and tries to concatenate it, it might throw a NoneType error or cause your frontend React/Vue components to crash. Pro Fix: Always add defensive programming in your extraction logic:

if chunk.content:
    # Normal text, send to frontend
    send_to_frontend(chunk.content)
elif chunk.tool_call_chunks:
    # Generating tool call arguments. You can show a "Retrieving data..." animation on the frontend
    show_loading_animation(chunk.tool_call_chunks[0]['name'])

🚨 Gotcha 2: The Ghostly "Double Output"

The Crash: Some students, wanting both streaming output and the final state, write their code as stream_mode=["messages", "updates"]. As a result, the frontend receives a typewriter output, and then instantly receives a duplicate chunk of the entire complete article at the end. Pro Fix: If you use multiple stream_modes simultaneously, the tuple returned by app.stream changes to (stream_mode_name, payload). You must route the stream by checking the first element:

for event_type, payload in app.stream(inputs, stream_mode=["messages", "updates"]):
    if event_type == "messages":
        chunk, metadata = payload
        # Handle typewriter logic...
    elif event_type == "updates":
        # Handle node state update logic (e.g., updating a progress bar in the sidebar)
        pass

🚨 Gotcha 3: Trying to Read Graph State During Streaming

The Crash: During stream_mode="messages", some students try to fetch the current global State using app.get_state(config), only to find it hasn't updated at all! Pro Fix: Remember the "philosophy" we discussed at the beginning. The messages mode intercepts the real-time output of the underlying LLM. At this moment, the Writer node has not finished executing! LangGraph's state update must wait until after the node returns. Therefore, during streaming output, the global State is still the state from when the previous node (Planner) finished. Do not attempt to read the current node's final State during token streaming.


📝 Episode Summary

Today, we solved a highly commercially valuable pain point: the UX blocking issue in long-text generation.

We didn't modify any internal code logic of the Agents. Simply by switching to stream_mode="messages" at the Graph invocation layer, combined with precise metadata routing and filtering, we endowed our AI Content Agency with seamless, real-time feedback capabilities.

This is the beauty of architectural design: Decoupling the underlying logic brings extreme flexibility to the top-level presentation.

Homework: Try modifying today's code so that the process of the Planner node generating the outline is also displayed on the frontend in a different color or another UI component (like a sidebar) using a typewriter effect. Hint: You'll need to modify the node_name == "writer" conditional logic.

In our next episode (Episode 24), we will tackle the most challenging part of the entire Agency project—Human-in-the-loop. If the Editor thinks the article is terrible, how do we pause the Graph, wait for the human Editor-in-Chief (that's you) to manually modify the outline, and then have the Writer regenerate it?

See you next time! Stay passionate, and keep Coding!