Logging, Monitoring & Improving Agents with LangGraph + Weights & Biases

Scaling Agentic Workflows with LangGraph Cloud: What You Need to Know

December 27, 2024

Hosting Agent Apps with Streamlit + FastAPI: Rapid UI for Intelligent Workflows

January 3, 2025

Published by Nathan Mallamace on December 27, 2024

🎯 Why Observability Matters in Agentic AI

Agentic applications are stateful, tool-using, and often recursive—which means:

Output quality varies by step, tool, or prompt
Bugs are hard to catch in chain-of-thought reasoning
Fine-tuning agent behavior requires data, not guesswork

By combining:

LangGraph, which gives you deterministic workflows via DAGs
With W&B, which logs metrics, inputs/outputs, and version diffs
…you gain full insight into how your agent thinks, acts, and improves.

🧰 What You’ll Use

Tool	Purpose
LangGraph	DAG-based agent orchestration
Weights & Biases	Experiment tracking and logging
LangChain	LLM access, tools, memory
[OpenAI / Hugging Face]	LLM model interface

📦 What You’ll Build

A LangGraph agent with three steps:

Accepts a research query
Searches for results (using a tool or simulated API)
Summarizes and returns an answer

At each step, you’ll:

✅ Log input/output data
✅ Measure latency
✅ Track model response length
✅ Tag with experiment metadata (e.g., prompt version, model)

✅ Step 1: Install Required Packages

bashCopyEditpip install langgraph langchain openai wandb

✅ Step 2: Set Up W&B

Create a free account at wandb.ai
Login from terminal:

bashCopyEditwandb login

Optionally, define a project:

bashCopyEditimport wandb
wandb.init(project="agent-observability", name="langgraph-v1")

✅ Step 3: Define Agent Nodes with Logging Hooks

pythonCopyEditfrom langgraph.graph import StateGraph
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchRun
import wandb
import time

llm = ChatOpenAI()
search_tool = DuckDuckGoSearchRun()
wandb.init(project="agent-observability", name="run-001")

class AgentState(dict): pass

def get_query(state):
    query = state.get("query", "What is LangGraph?")
    wandb.log({"step": "get_query", "query": query})
    state["query"] = query
    return state

def search_web(state):
    start = time.time()
    result = search_tool.run(state["query"])
    end = time.time()

    wandb.log({
        "step": "search_web",
        "search_result": result,
        "latency": end - start
    })
    state["search_result"] = result
    return state

def summarize(state):
    prompt = f"Summarize this: {state['search_result']}"
    output = llm.predict(prompt)

    wandb.log({
        "step": "summarize",
        "prompt": prompt,
        "response": output,
        "tokens_used": len(output.split())
    })
    state["summary"] = output
    return state

✅ Step 4: Build the LangGraph DAG

pythonCopyEditgraph = StateGraph(AgentState)
graph.add_node("GetQuery", get_query)
graph.add_node("Search", search_web)
graph.add_node("Summarize", summarize)

graph.set_entry_point("GetQuery")
graph.add_edge("GetQuery", "Search")
graph.add_edge("Search", "Summarize")
graph.set_finish_point("Summarize")

executor = graph.compile()

✅ Step 5: Run the Agent and Monitor

pythonCopyEditstate = AgentState({"query": "How does CrewAI compare to LangGraph?"})
result = executor.invoke(state)
print("SUMMARY:", result["summary"])

➡️ Head to your Weights & Biases dashboard to view:

Step-by-step logs
Token usage
Latency per agent action
Prompt → Output comparisons

🔁 Optional: Track Prompt Versions and Model Swaps

pythonCopyEditwandb.config.update({
    "prompt_template": "Summarize this: {text}",
    "llm_model": "gpt-3.5-turbo"
})

This helps when A/B testing:

Different prompt strategies
LLM temperature settings
Multi-agent flow variations

📊 Visualization Ideas in W&B

Metric	Use Case
Token count	Optimize for cost-efficiency
Latency per node	Identify performance bottlenecks
Prompt-to-output	Spot hallucination or bad summaries
Version diffs	Evaluate performance between prompt revisions

Use W&B Tables or custom dashboards for more insight.

📚 Additional Resources

🔗 LangGraph Docs
🔗 W&B Python Guide
🔗 LangChain Agents + Tools
🧠 Best Practices for LLM Evaluation (Medium.com Article)

Scaling Agentic Workflows with LangGraph Cloud: What You Need to Know

Hosting Agent Apps with Streamlit + FastAPI: Rapid UI for Intelligent Workflows

Scaling Agentic Workflows with LangGraph Cloud: What You Need to Know

Hosting Agent Apps with Streamlit + FastAPI: Rapid UI for Intelligent Workflows

🎯 Why Observability Matters in Agentic AI

🧰 What You’ll Use

📦 What You’ll Build

✅ Step 1: Install Required Packages

✅ Step 2: Set Up W&B

✅ Step 3: Define Agent Nodes with Logging Hooks

✅ Step 4: Build the LangGraph DAG

✅ Step 5: Run the Agent and Monitor

🔁 Optional: Track Prompt Versions and Model Swaps

📊 Visualization Ideas in W&B

📚 Additional Resources

Nathan Mallamace

Related posts

How to Install n8n on Your DownDoggy.com VPS with a Custom Domain & HTTPS

Which Framework Should You Use for Agentic AI? CrewAI vs LangChain vs LangGraph

Designing Autonomous Agents: From Prompt Engineering to Goal-Oriented Behavior

Leave a Reply Cancel reply