Lesson 17 | Logging System & Observability Practices

20 MIN READ | UPDATED: 2026-05-07

title: "Lesson 17 | Logging System & Observability Practices" summary: "Dive deep into Hermes' logging system – tracing conversation history, tracking Skill execution, and diagnosing/debugging errors. Integrate with LLM observability platforms like Langfuse/LangSmith for end-to-end monitoring." sortOrder: 170 status: "published"

Dive deep into Hermes' logging system – tracing conversation history, tracking Skill execution, and diagnosing/debugging errors. Integrate with LLM observability platforms like Langfuse/LangSmith for end-to-end monitoring.

🎯 Learning Objectives

Understand Hermes Agent's built-in logging system, including log levels, storage locations, and output formats.
Master how to use Hermes Agent's logs to trace conversation history, track Skill execution flow, and diagnose errors.
Learn the core concepts of LLM Observability and the value of platforms like Langfuse/LangSmith.
Explore strategies for integrating or synchronizing Hermes Agent's operational data with external LLM observability platforms.

📖 Core Concepts Explained

17.1 Hermes Agent's Built-in Logging System

As a self-evolving AI agent, Hermes Agent generates a large amount of internal information during its operation. This information is crucial for understanding the agent's behavior, debugging issues, and optimizing performance. Hermes Agent features a comprehensive built-in logging system that records every critical stage from startup to task completion.

Hermes Agent's logs primarily record the following types of information:

Agent Thoughts: These are core records of the agent's decision-making, planning, and reflection processes. They include the agent's understanding of the current task, reasons for selecting tools/Skills, and generation of next action plans. These logs are key to understanding the agent's 'mindset'.
LLM Calls: Every interaction between the agent and a Large Language Model (LLM) is recorded. This includes the prompt sent to the LLM, the response received, the model name used, and Token consumption. This is very useful for analyzing LLM performance, cost, and debugging prompts.
Skill Execution: When the agent decides to invoke a Skill, its invocation process, input parameters, execution result (success or failure), and any error messages are recorded in detail. This is closely related to the Skill lifecycle we discussed in Lesson 03 | Deep Dive into the Skills System.
Memory Operations: Hermes Agent has cross-session memory capabilities (see Lesson 04 | Memory and User Profiles). Operations such as reading, writing, and updating memory are also recorded, helping to understand how the agent utilizes historical information.
System Events & Errors: This includes agent startup/shutdown, configuration loading, external service connection status, and any exceptions or errors that occur during operation.

Log Levels and Configuration:

Hermes Agent supports standard log levels such as DEBUG, INFO, WARNING, ERROR, CRITICAL. By default, output is usually at the INFO level. We can adjust the log level using the hermes config set command to get more detailed or concise output.

INFO: Provides high-level operational information, such as task start, task completion, and major decision points.
DEBUG: Provides the most detailed internal operational information, including full LLM prompts and responses, detailed Skill execution steps, and intermediate thought processes. This is very useful during development and debugging.
WARNING, ERROR, CRITICAL: Used to report potential issues, errors, and critical failures.

Log Output Location:

Hermes Agent's logs are output to standard output (stdout) by default, meaning you will see log information directly when running hermes commands in the terminal. In production environments, logs are typically redirected to files or collected via log management tools.

# View current log level configuration
hermes config get log_level

# Set log level to DEBUG to get the most detailed output
# This is crucial for debugging and deeply understanding agent behavior
hermes config set log_level debug

# Run a Hermes Agent task and observe the detailed DEBUG log output
# For example, ask the Agent to write a simple Python function
hermes "Write a Python function to calculate the nth Fibonacci number."

By setting the log level to DEBUG, you will be able to see every thought step of the agent, the complete input and output of the LLM, and the detailed process of Skill invocation. This provides us with powerful observability, helping us understand the agent's decision path and potential issues.

17.2 Log Interpretation and Problem Diagnosis

A deep understanding of Hermes Agent's logs is key to efficient use and debugging. Logs not only record the agent's 'footprints' but also serve as our 'detective tool' for tracing issues and optimizing behavior.

Tracing Conversation History and Agent Thoughts:

When Hermes Agent runs, it performs a series of thoughts, plans, and executions. These processes are clearly visible in DEBUG level logs.

Agent Thoughts: Logs will contain markers like [AGENT THOUGHT] or [PLAN], followed by the agent's internal monologue. For example, the agent might say: "I need a Skill to generate code, and then test it."
LLM Interaction: You will see [LLM CALL] or [PROMPT] markers, followed by the complete prompt sent to the LLM, and [LLM RESPONSE] or [RESPONSE] markers, followed by the LLM's response. This is very important for analyzing prompt effectiveness and debugging LLM behavior.

# Example log snippet (DEBUG level)
[2023-10-27 10:30:05.123 DEBUG] [AGENT] Current Task: Write a Python function to calculate the nth Fibonacci number.
[2023-10-27 10:30:05.124 DEBUG] [AGENT THOUGHT] I need to think about how to complete this task. First, I need a tool that can generate Python code.
[2023-10-27 10:30:05.125 DEBUG] [LLM CALL] Sending prompt to LLM (model: gpt-4-turbo-preview):
---
You are a helpful AI assistant.
... (truncated prompt for brevity) ...
User: Write a Python function to calculate the nth Fibonacci number.
---
[2023-10-27 10:30:07.890 DEBUG] [LLM RESPONSE] Received response from LLM:
---
<tool_code>
print_python_code("def fibonacci(n):\n    if n <= 0:\n        return []\n    elif n == 1:\n        return [0]\n    else:\n        list_fib = [0, 1]\n        while len(list_fib) < n:\n            next_fib = list_fib[-1] + list_fib[-2]\n            list_fib.append(next_fib)\n        return list_fib\n")
</tool_code>
---
[2023-10-27 10:30:07.891 DEBUG] [AGENT] Tool call detected: print_python_code
[2023-10-27 10:30:07.892 DEBUG] [SKILL EXECUTION] Calling skill 'print_python_code' with args: {'code': 'def fibonacci(n):\n    ...'}
[2023-10-27 10:30:07.895 INFO] [SKILL] Skill 'print_python_code' executed successfully.

Through these logs, we can clearly see how the agent receives a task, how it thinks, how it interacts with the LLM, and which Skill it ultimately invoked to complete the task.

Skill Execution Tracing and Error Diagnosis:

Skills are the core capability units for Hermes Agent to complete tasks. When a Skill execution fails, logs are the primary source for diagnosing issues.

Skill Call Tracing: Logs will clearly indicate which Skill was called and what parameters were passed. This helps confirm whether the agent correctly understood the task and selected the appropriate Skill.
Skill Internal Errors: If an exception occurs during Skill execution (e.g., code error, external API call failure), these errors are usually recorded at the [ERROR] or [WARNING] level, along with a stack trace.

Common Skill Error Diagnosis Scenarios:

Parameter Errors: The agent passed incorrect parameter types or values to the Skill. Logs will show the parameters received by the Skill, which you can compare with the Skill's expected parameters.
External Service Failures: If a Skill relies on external APIs or services (e.g., querying weather, sending emails), when these services are unavailable or return errors, the Skill will capture and log the relevant errors.
Skill Logic Errors: The Skill's own Python code has a bug. The stack trace in the logs will point to the specific line of code where the error occurred.
Permission Issues: The Skill attempted to perform an operation but lacked the necessary permissions.

# Assume we have a Skill named 'read_file', but it tries to read a non-existent file
# Run in DEBUG mode and observe error logs

# First, ensure log_level is debug
hermes config set log_level debug

# Try to make the Agent read a non-existent file (assuming 'non_existent_file.txt' does not exist in the current directory)
# Assume the Agent will try to call a Skill named 'read_file'
hermes "Read the content of the file 'non_existent_file.txt'."

Expected Log Output Example (partial):

[2023-10-27 10:35:10.123 DEBUG] [AGENT THOUGHT] I need to read a file, I will use the 'read_file' Skill.
[2023-10-27 10:35:10.124 DEBUG] [LLM CALL] Sending prompt to LLM (model: gpt-4-turbo-preview):
...
[2023-10-27 10:35:12.567 DEBUG] [LLM RESPONSE] Received response from LLM:
---
<tool_code>
read_file("non_existent_file.txt")
</tool_code>
---
[2023-10-27 10:35:12.568 DEBUG] [AGENT] Tool call detected: read_file
[2023-10-27 10:35:12.569 DEBUG] [SKILL EXECUTION] Calling skill 'read_file' with args: {'file_path': 'non_existent_file.txt'}
[2023-10-27 10:35:12.570 ERROR] [SKILL] Skill 'read_file' failed with error:
Traceback (most recent call last):
  File ".../hermes-agent/skills/read_file.py", line 15, in execute
    with open(file_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'non_existent_file.txt'
[2023-10-27 10:35:12.571 DEBUG] [AGENT] Skill 'read_file' execution failed. Error: FileNotFoundError: [Errno 2] No such file or directory: 'non_existent_file.txt'
[2023-10-27 10:35:12.572 DEBUG] [AGENT THOUGHT] Skill 'read_file' failed because the file does not exist. I need to report this error to the user.

From the logs above, we can clearly see that the read_file Skill attempted to read the file but failed due to a FileNotFoundError, and the agent caught this error and prepared to report it to the user.

17.3 Overview of LLM Observability Platforms

As applications based on Large Language Models (LLMs) become increasingly complex, traditional logging systems are no longer sufficient to provide comprehensive insights. LLM observability platforms have emerged, designed to provide end-to-end visibility, helping developers understand, debug, and optimize LLM-driven applications. Among them, Langfuse and LangSmith are two leading solutions.

Why are LLM Observability Platforms needed?

Complexity: LLM applications often involve multi-turn conversations, tool calls, RAG (Retrieval-Augmented Generation), agent decision chains, etc., making their internal states and processes difficult to track with simple logs.
Non-determinism: LLM outputs are non-deterministic; the same input can produce different outputs. Observability platforms help understand this variability.
Black Box Issue: The LLM itself is a black box, making it difficult to directly see its internal reasoning process. Platforms 'reveal' the black box by recording prompts, responses, and intermediate steps.
Performance and Cost: LLM calls consume Tokens and computational resources, incurring costs. Platforms can monitor Token usage, latency, and cost to aid optimization.
Quality and Iteration: Evaluating the quality of LLM applications and making continuous improvements requires extensive experimental data, A/B testing, and user feedback. Platforms provide the ability to collect and analyze this data.

Core Features:

LLM observability platforms typically offer the following core features:

Traces and Spans:
- Trace: Represents the end-to-end execution flow of a complete user request or Agent task. For example, the entire process from user input to the Agent's final response.
- Span: A component of a Trace, representing an independent operation or step. For example, an LLM call, a Skill execution, or a data retrieval operation. Spans can be nested, forming a tree-like structure that clearly shows the causal relationships and temporal sequence of operations.
- Through Traces and Spans, we can visually see all steps of the Agent's decision path, LLM interactions, tool usage, etc., as well as their relationships and time consumption.
Input/Output Recording: Records the complete inputs (e.g., prompts, parameters) and outputs (e.g., LLM responses, Skill results) for each LLM call or Skill execution.
Metadata & Tags: Allows attaching custom metadata (e.g., user ID, session ID, model version) to Traces and Spans for filtering, searching, and aggregated analysis.
Cost & Performance Monitoring: Automatically calculates Token consumption and cost for each LLM call, as well as the latency for each step.
Evaluation & Annotation: Supports manual quality evaluation and annotation of LLM outputs, or integration of automated evaluation metrics, to continuously improve models and applications.
Dataset Management: Extracts valuable prompts, responses, and tool call sequences from actual operational data to build datasets for fine-tuning, regression testing, and evaluation.

Features of Langfuse and LangSmith:

Langfuse (https://langfuse.com/): Open-source and self-hostable, offering powerful tracing, evaluation, prompt engineering, and dataset management features. It integrates tightly with frameworks like LangChain and also supports other LLM frameworks and custom integrations.
LangSmith (https://www.langchain.com/langsmith): The official platform provided by LangChain, seamlessly integrated with the LangChain ecosystem, offering similar tracing, debugging, evaluation, and monitoring capabilities.

These platforms significantly enhance the development, debugging, and maintenance efficiency of LLM applications by providing structured data capture and powerful visualization interfaces.

17.4 Integration Strategies for Hermes Agent with External Observability Platforms

Hermes Agent is an independent CLI tool, with its core interactions happening via the terminal. While it doesn't have direct built-in client integration for Langfuse or LangSmith, we can bridge its operational data with these external observability platforms through several strategies to achieve more advanced monitoring and analysis.

Integration strategies primarily revolve around how to capture key events from Hermes Agent's operations and transform them into the structured data format required by observability platforms (e.g., Trace and Span).

Strategy One: Post-processing Hermes Agent's Detailed Log Output

This is the most direct method and requires no modification to Hermes Agent's core code.

Enable Detailed Logging: Set Hermes Agent's log_level to DEBUG to ensure all internal thoughts, LLM calls, Skill executions, etc., are output.
Capture Log Stream: Redirect Hermes Agent's standard output to a file, or pipe it to a custom log parser.
Log Parser: Write a Python script or a separate process to parse Hermes Agent's log files in real-time or periodically. This parser needs to:
- Identify key log patterns (e.g., [AGENT THOUGHT], [LLM CALL], [SKILL EXECUTION], [ERROR]).
- Extract structured information from log entries (e.g., LLM model name, prompt content, response content, Skill name, parameters, execution result, error messages).
- Map this information to the Trace and Span models of an observability platform (e.g., Langfuse SDK).
For example, when the parser sees [LLM CALL], it can create a new Langfuse span of type llm; when it sees [SKILL EXECUTION], it can create a span of type tool. By associating trace_id and parent_span_id, a complete execution chain can be constructed.

Advantages: No modification to Hermes Agent source code required, highly versatile. Disadvantages: Parsing can be complex, potential for missed or incorrect reports, real-time capability depends on the parser's implementation.

Strategy Two: Reporting Key Events via Custom Skills

If you want to report specific Agent behaviors as they occur, you can create custom Skills to interact with observability platforms.

Create Reporting Skills: Write one or more new Hermes Skills, such as report_llm_call, report_skill_execution, report_agent_thought.
Integrate SDK: Integrate the Python SDK of Langfuse or LangSmith within the Python code of these Skills.
Agent Decision Invocation: Through prompt engineering, guide the Agent to actively invoke these reporting Skills after completing an LLM call or Skill execution, passing relevant information to them.

For example, after completing a code_generation Skill, the Agent could call report_skill_execution(skill_name="code_generation", result="success", ...).

Advantages: More precise event reporting, capable of capturing higher-level internal Agent decisions. Disadvantages: Requires modifying the Agent's behavior (via prompt guidance), increases the complexity of the Agent task itself, and may not capture all low-level details.

Strategy Three: Modifying Hermes Agent Core Code (Advanced)

For scenarios requiring deep integration and maximum control, you can directly modify Hermes Agent's Python source code to inject observability platform SDK calls at its critical execution points (e.g., where LLM calls are made, within the Skill executor).

Locate Key Code: Identify the code locations in Hermes Agent responsible for LLM calls, Skill scheduling, and the Agent's thought loop.
Inject SDK Calls: At these locations, add trace, span, event, and other API calls from the Langfuse or LangSmith SDK to send corresponding inputs, outputs, and metadata to the platform.

Advantages: Most thorough integration, capable of capturing all details and enabling real-time, structured data reporting. Disadvantages: Requires familiarity with Hermes Agent's internal implementation, requires self-maintenance after modification, and is not easily upgradable.

Conceptual Steps for Implementing Langfuse Integration (Based on Strategy One or Three):

Here, using Langfuse as an example, we assume integration via a log parser or direct code modification.

Install Langfuse SDK:
```
pip install langfuse
```
Configure Langfuse Environment Variables:
```
export LANGFUSE_PUBLIC_KEY="pk_..."
```

← PREVIOUS LESSON Lesson 16 | Performance Tuning & Token Economics

NEXT LESSON → Lesson 18 | Production Deployment: VPS to GPU Clusters