December 27, 2024
Add observability and performance tuning to your agentic AI stack
Agents are only as effective as your ability to monitor and improve them. In this tutorial, you’ll learn how to use LangGraph for structured agent workflows, and Weights & Biases (W&B) for logging, evaluation, and performance tracking. Whether you’re debugging prompt chains, tracking tool usage, or comparing versions of your agent architecture, this combo gives you the observability layer needed to confidently scale agentic AI.
Agentic applications are stateful, tool-using, and often recursive—which means:
By combining:
Tool | Purpose |
---|---|
LangGraph | DAG-based agent orchestration |
Weights & Biases | Experiment tracking and logging |
LangChain | LLM access, tools, memory |
[OpenAI / Hugging Face] | LLM model interface |
A LangGraph agent with three steps:
At each step, you’ll:
bashCopyEditpip install langgraph langchain openai wandb
bashCopyEditwandb login
bashCopyEditimport wandb
wandb.init(project="agent-observability", name="langgraph-v1")
pythonCopyEditfrom langgraph.graph import StateGraph
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchRun
import wandb
import time
llm = ChatOpenAI()
search_tool = DuckDuckGoSearchRun()
wandb.init(project="agent-observability", name="run-001")
class AgentState(dict): pass
def get_query(state):
query = state.get("query", "What is LangGraph?")
wandb.log({"step": "get_query", "query": query})
state["query"] = query
return state
def search_web(state):
start = time.time()
result = search_tool.run(state["query"])
end = time.time()
wandb.log({
"step": "search_web",
"search_result": result,
"latency": end - start
})
state["search_result"] = result
return state
def summarize(state):
prompt = f"Summarize this: {state['search_result']}"
output = llm.predict(prompt)
wandb.log({
"step": "summarize",
"prompt": prompt,
"response": output,
"tokens_used": len(output.split())
})
state["summary"] = output
return state
pythonCopyEditgraph = StateGraph(AgentState)
graph.add_node("GetQuery", get_query)
graph.add_node("Search", search_web)
graph.add_node("Summarize", summarize)
graph.set_entry_point("GetQuery")
graph.add_edge("GetQuery", "Search")
graph.add_edge("Search", "Summarize")
graph.set_finish_point("Summarize")
executor = graph.compile()
pythonCopyEditstate = AgentState({"query": "How does CrewAI compare to LangGraph?"})
result = executor.invoke(state)
print("SUMMARY:", result["summary"])
➡️ Head to your Weights & Biases dashboard to view:
pythonCopyEditwandb.config.update({
"prompt_template": "Summarize this: {text}",
"llm_model": "gpt-3.5-turbo"
})
This helps when A/B testing:
Metric | Use Case |
---|---|
Token count | Optimize for cost-efficiency |
Latency per node | Identify performance bottlenecks |
Prompt-to-output | Spot hallucination or bad summaries |
Version diffs | Evaluate performance between prompt revisions |
Use W&B Tables or custom dashboards for more insight.
DoggyDish.com is where agentic AI meets real-world deployment. We go beyond theory to showcase how intelligent agents are built, scaled, and optimized—from initial idea to full-scale production. Whether you're training LLMs, deploying inferencing at the edge, or building out AI-ready infrastructure, we provide actionable insights to help you move from lab to launch with the hardware to match.
© 2025 DoggyDish.com · All rights reserved · Privacy Policy · Terms of Use