Logging, Monitoring & Improving Agents with LangGraph + Weights & Biases - DoggyDish.com

December 27, 2024

post-thumnail

Add observability and performance tuning to your agentic AI stack

Agents are only as effective as your ability to monitor and improve them. In this tutorial, you’ll learn how to use LangGraph for structured agent workflows, and Weights & Biases (W&B) for logging, evaluation, and performance tracking. Whether you’re debugging prompt chains, tracking tool usage, or comparing versions of your agent architecture, this combo gives you the observability layer needed to confidently scale agentic AI.


🎯 Why Observability Matters in Agentic AI

Agentic applications are stateful, tool-using, and often recursive—which means:

  • Output quality varies by step, tool, or prompt
  • Bugs are hard to catch in chain-of-thought reasoning
  • Fine-tuning agent behavior requires data, not guesswork

By combining:

  • LangGraph, which gives you deterministic workflows via DAGs
  • With W&B, which logs metrics, inputs/outputs, and version diffs
    …you gain full insight into how your agent thinks, acts, and improves.

🧰 What You’ll Use

ToolPurpose
LangGraphDAG-based agent orchestration
Weights & BiasesExperiment tracking and logging
LangChainLLM access, tools, memory
[OpenAI / Hugging Face]LLM model interface

📦 What You’ll Build

A LangGraph agent with three steps:

  1. Accepts a research query
  2. Searches for results (using a tool or simulated API)
  3. Summarizes and returns an answer

At each step, you’ll:

  • ✅ Log input/output data
  • ✅ Measure latency
  • ✅ Track model response length
  • ✅ Tag with experiment metadata (e.g., prompt version, model)

✅ Step 1: Install Required Packages

bashCopyEditpip install langgraph langchain openai wandb

✅ Step 2: Set Up W&B

  • Create a free account at wandb.ai
  • Login from terminal:
bashCopyEditwandb login
  • Optionally, define a project:
bashCopyEditimport wandb
wandb.init(project="agent-observability", name="langgraph-v1")

✅ Step 3: Define Agent Nodes with Logging Hooks

pythonCopyEditfrom langgraph.graph import StateGraph
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchRun
import wandb
import time

llm = ChatOpenAI()
search_tool = DuckDuckGoSearchRun()
wandb.init(project="agent-observability", name="run-001")

class AgentState(dict): pass

def get_query(state):
    query = state.get("query", "What is LangGraph?")
    wandb.log({"step": "get_query", "query": query})
    state["query"] = query
    return state

def search_web(state):
    start = time.time()
    result = search_tool.run(state["query"])
    end = time.time()

    wandb.log({
        "step": "search_web",
        "search_result": result,
        "latency": end - start
    })
    state["search_result"] = result
    return state

def summarize(state):
    prompt = f"Summarize this: {state['search_result']}"
    output = llm.predict(prompt)

    wandb.log({
        "step": "summarize",
        "prompt": prompt,
        "response": output,
        "tokens_used": len(output.split())
    })
    state["summary"] = output
    return state

✅ Step 4: Build the LangGraph DAG

pythonCopyEditgraph = StateGraph(AgentState)
graph.add_node("GetQuery", get_query)
graph.add_node("Search", search_web)
graph.add_node("Summarize", summarize)

graph.set_entry_point("GetQuery")
graph.add_edge("GetQuery", "Search")
graph.add_edge("Search", "Summarize")
graph.set_finish_point("Summarize")

executor = graph.compile()

✅ Step 5: Run the Agent and Monitor

pythonCopyEditstate = AgentState({"query": "How does CrewAI compare to LangGraph?"})
result = executor.invoke(state)
print("SUMMARY:", result["summary"])

➡️ Head to your Weights & Biases dashboard to view:

  • Step-by-step logs
  • Token usage
  • Latency per agent action
  • Prompt → Output comparisons

🔁 Optional: Track Prompt Versions and Model Swaps


pythonCopyEditwandb.config.update({
    "prompt_template": "Summarize this: {text}",
    "llm_model": "gpt-3.5-turbo"
})

This helps when A/B testing:

  • Different prompt strategies
  • LLM temperature settings
  • Multi-agent flow variations

📊 Visualization Ideas in W&B

MetricUse Case
Token countOptimize for cost-efficiency
Latency per nodeIdentify performance bottlenecks
Prompt-to-outputSpot hallucination or bad summaries
Version diffsEvaluate performance between prompt revisions

Use W&B Tables or custom dashboards for more insight.


📚 Additional Resources

Leave a Reply

Your email address will not be published. Required fields are marked *