
Scaling Agentic Workflows with LangGraph Cloud: What You Need to Know
December 27, 2024Hosting Agent Apps with Streamlit + FastAPI: Rapid UI for Intelligent Workflows
January 3, 2025Add observability and performance tuning to your agentic AI stack
Agents are only as effective as your ability to monitor and improve them. In this tutorial, you’ll learn how to use LangGraph for structured agent workflows, and Weights & Biases (W&B) for logging, evaluation, and performance tracking. Whether you’re debugging prompt chains, tracking tool usage, or comparing versions of your agent architecture, this combo gives you the observability layer needed to confidently scale agentic AI.
🎯 Why Observability Matters in Agentic AI
Agentic applications are stateful, tool-using, and often recursive—which means:
- Output quality varies by step, tool, or prompt
- Bugs are hard to catch in chain-of-thought reasoning
- Fine-tuning agent behavior requires data, not guesswork
By combining:
- LangGraph, which gives you deterministic workflows via DAGs
- With W&B, which logs metrics, inputs/outputs, and version diffs
…you gain full insight into how your agent thinks, acts, and improves.
🧰 What You’ll Use
| Tool | Purpose |
|---|---|
| LangGraph | DAG-based agent orchestration |
| Weights & Biases | Experiment tracking and logging |
| LangChain | LLM access, tools, memory |
| [OpenAI / Hugging Face] | LLM model interface |
📦 What You’ll Build
A LangGraph agent with three steps:
- Accepts a research query
- Searches for results (using a tool or simulated API)
- Summarizes and returns an answer
At each step, you’ll:
- ✅ Log input/output data
- ✅ Measure latency
- ✅ Track model response length
- ✅ Tag with experiment metadata (e.g., prompt version, model)
✅ Step 1: Install Required Packages
bashCopyEditpip install langgraph langchain openai wandb✅ Step 2: Set Up W&B
- Create a free account at wandb.ai
- Login from terminal:
bashCopyEditwandb login- Optionally, define a project:
bashCopyEditimport wandb
wandb.init(project="agent-observability", name="langgraph-v1")✅ Step 3: Define Agent Nodes with Logging Hooks
pythonCopyEditfrom langgraph.graph import StateGraph
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchRun
import wandb
import time
llm = ChatOpenAI()
search_tool = DuckDuckGoSearchRun()
wandb.init(project="agent-observability", name="run-001")
class AgentState(dict): pass
def get_query(state):
query = state.get("query", "What is LangGraph?")
wandb.log({"step": "get_query", "query": query})
state["query"] = query
return state
def search_web(state):
start = time.time()
result = search_tool.run(state["query"])
end = time.time()
wandb.log({
"step": "search_web",
"search_result": result,
"latency": end - start
})
state["search_result"] = result
return state
def summarize(state):
prompt = f"Summarize this: {state['search_result']}"
output = llm.predict(prompt)
wandb.log({
"step": "summarize",
"prompt": prompt,
"response": output,
"tokens_used": len(output.split())
})
state["summary"] = output
return state✅ Step 4: Build the LangGraph DAG
pythonCopyEditgraph = StateGraph(AgentState)
graph.add_node("GetQuery", get_query)
graph.add_node("Search", search_web)
graph.add_node("Summarize", summarize)
graph.set_entry_point("GetQuery")
graph.add_edge("GetQuery", "Search")
graph.add_edge("Search", "Summarize")
graph.set_finish_point("Summarize")
executor = graph.compile()✅ Step 5: Run the Agent and Monitor
pythonCopyEditstate = AgentState({"query": "How does CrewAI compare to LangGraph?"})
result = executor.invoke(state)
print("SUMMARY:", result["summary"])➡️ Head to your Weights & Biases dashboard to view:
- Step-by-step logs
- Token usage
- Latency per agent action
- Prompt → Output comparisons
🔁 Optional: Track Prompt Versions and Model Swaps
pythonCopyEditwandb.config.update({
"prompt_template": "Summarize this: {text}",
"llm_model": "gpt-3.5-turbo"
})This helps when A/B testing:
- Different prompt strategies
- LLM temperature settings
- Multi-agent flow variations
📊 Visualization Ideas in W&B
| Metric | Use Case |
|---|---|
| Token count | Optimize for cost-efficiency |
| Latency per node | Identify performance bottlenecks |
| Prompt-to-output | Spot hallucination or bad summaries |
| Version diffs | Evaluate performance between prompt revisions |
Use W&B Tables or custom dashboards for more insight.
📚 Additional Resources
- 🔗 LangGraph Docs
- 🔗 W&B Python Guide
- 🔗 LangChain Agents + Tools
- 🧠 Best Practices for LLM Evaluation (Medium.com Article)



