Observability-Driven Development
If you can't observe it, you can't debug it. If you can't debug it, you can't ship it.
"Observability is not optional — it's the control plane for AI."
Traditional logging captures what happened. Observability captures why it happened, how long it took, what it cost, and whether it worked. For non-deterministic AI systems, this isn't a nice-to-have — it's the only way to understand and improve your workflows.
The Four Pillars of AI Observability
Traces
End-to-end request flow through the DAG. See exactly how tasks connect and data flows.
workflow.run → task.analyze → agent.call → tool.read → agent.responseSpans
Individual operations with timing, parent-child relationships, and attributes.
span: llm.call | duration: 2.3s | tokens: 4521 | model: claude-sonnetMetrics
Aggregated measurements: latency percentiles, token usage, error rates, costs.
p99_latency: 4.2s | tokens_per_task: 8500 | error_rate: 0.02Logs
Structured events with context. Every decision, every tool call, every output.
{"level":"info","task":"analyze","action":"tool_call","tool":"Read"}Why AI Observability is Different
Non-Deterministic Outputs
Same input, different outputs. Hard to reproduce bugs.
Capture full context: prompt, temperature, seed, model version.
Hidden Reasoning
LLMs are black boxes. Why did it make that decision?
Trace chain-of-thought, tool selections, and intermediate outputs.
Cascading Failures
Error in task 3 causes task 7 to fail. Root cause is obscured.
Span parent-child relationships show exact failure propagation.
Cost Attribution
API bill is $5000. Which workflow is responsible?
Per-task token metrics with cost attribution and anomaly detection.
OpenTelemetry Compatible
Nika exports traces, spans, and metrics in OpenTelemetry format — the industry standard for observability. Export to Jaeger, Zipkin, Datadog, Honeycomb, Grafana, or any OTLP-compatible backend. No vendor lock-in for your observability stack either.
What Nika Captures
{
"traceId": "abc123...",
"spans": [
{
"name": "workflow.run",
"duration": "45.2s",
"attributes": {
"workflow.id": "code-review",
"workflow.tasks": 5,
"workflow.provider": "anthropic"
},
"children": [
{
"name": "task.analyze",
"duration": "12.3s",
"attributes": {
"task.id": "analyze",
"task.scope": "minimal",
"agent.model": "claude-sonnet-4-5",
"agent.tokens.input": 4521,
"agent.tokens.output": 1823,
"agent.cost": 0.024,
"agent.turns": 3
},
"children": [
{
"name": "tool.Read",
"duration": "0.8s",
"attributes": {
"tool.path": "src/main.rs",
"tool.bytes": 12456
}
}
]
}
]
}
],
"metrics": {
"total_cost": 0.087,
"total_tokens": 23456,
"p99_latency": "18.2s",
"error_rate": 0.0
}
}Real-Time in the TUI
Nika's terminal UI shows observability data in real-time as workflows execute:
┌─ Workflow: code-review ─────────────────────────────────────┐
│ │
│ ◎ analyze [minimal] ████████████░░░░░ 12.3s $0.024 │
│ ├─ Read(src/main.rs) 0.8s │
│ ├─ Grep(TODO|FIXME) 0.3s │
│ └─ LLM(claude-sonnet) 11.2s 4521→1823 tokens │
│ │
│ ◈ review [default] ████░░░░░░░░░░░░░ 4.1s $0.012 │
│ └─ LLM(claude-sonnet) 4.1s processing... │
│ │
│ ◎ report [minimal] ░░░░░░░░░░░░░░░░░ pending │
│ │
├─────────────────────────────────────────────────────────────┤
│ Tokens: 6,344 │ Cost: $0.036 │ Time: 16.4s │
└─────────────────────────────────────────────────────────────┘Implementation Status
- Real-time TUI with progress — Implemented
- Token and cost tracking — Implemented
- Structured trace output — In Design
- OpenTelemetry export — In Design
- Historical trace storage — Future