Design Phase

Observability-Driven Development

If you can't observe it, you can't debug it. If you can't debug it, you can't ship it.

"Observability is not optional — it's the control plane for AI."

Traditional logging captures what happened. Observability captures why it happened, how long it took, what it cost, and whether it worked. For non-deterministic AI systems, this isn't a nice-to-have — it's the only way to understand and improve your workflows.

The Four Pillars of AI Observability

Traces

End-to-end request flow through the DAG. See exactly how tasks connect and data flows.

workflow.run → task.analyze → agent.call → tool.read → agent.response

Spans

Individual operations with timing, parent-child relationships, and attributes.

span: llm.call | duration: 2.3s | tokens: 4521 | model: claude-sonnet

Metrics

Aggregated measurements: latency percentiles, token usage, error rates, costs.

p99_latency: 4.2s | tokens_per_task: 8500 | error_rate: 0.02

Logs

Structured events with context. Every decision, every tool call, every output.

{"level":"info","task":"analyze","action":"tool_call","tool":"Read"}

Why AI Observability is Different

Non-Deterministic Outputs

TRADITIONAL:

Same input, different outputs. Hard to reproduce bugs.

OBSERVABLE:

Capture full context: prompt, temperature, seed, model version.

Hidden Reasoning

TRADITIONAL:

LLMs are black boxes. Why did it make that decision?

OBSERVABLE:

Trace chain-of-thought, tool selections, and intermediate outputs.

Cascading Failures

TRADITIONAL:

Error in task 3 causes task 7 to fail. Root cause is obscured.

OBSERVABLE:

Span parent-child relationships show exact failure propagation.

Cost Attribution

TRADITIONAL:

API bill is $5000. Which workflow is responsible?

OBSERVABLE:

Per-task token metrics with cost attribution and anomaly detection.

OpenTelemetry Compatible

Nika exports traces, spans, and metrics in OpenTelemetry format — the industry standard for observability. Export to Jaeger, Zipkin, Datadog, Honeycomb, Grafana, or any OTLP-compatible backend. No vendor lock-in for your observability stack either.

What Nika Captures

trace-example.json

{
  "traceId": "abc123...",
  "spans": [
    {
      "name": "workflow.run",
      "duration": "45.2s",
      "attributes": {
        "workflow.id": "code-review",
        "workflow.tasks": 5,
        "workflow.provider": "anthropic"
      },
      "children": [
        {
          "name": "task.analyze",
          "duration": "12.3s",
          "attributes": {
            "task.id": "analyze",
            "task.scope": "minimal",
            "agent.model": "claude-sonnet-4-5",
            "agent.tokens.input": 4521,
            "agent.tokens.output": 1823,
            "agent.cost": 0.024,
            "agent.turns": 3
          },
          "children": [
            {
              "name": "tool.Read",
              "duration": "0.8s",
              "attributes": {
                "tool.path": "src/main.rs",
                "tool.bytes": 12456
              }
            }
          ]
        }
      ]
    }
  ],
  "metrics": {
    "total_cost": 0.087,
    "total_tokens": 23456,
    "p99_latency": "18.2s",
    "error_rate": 0.0
  }
}

Real-Time in the TUI

Nika's terminal UI shows observability data in real-time as workflows execute:

┌─ Workflow: code-review ─────────────────────────────────────┐
│                                                             │
│  ◎ analyze  [minimal]  ████████████░░░░░  12.3s  $0.024    │
│    ├─ Read(src/main.rs)      0.8s                          │
│    ├─ Grep(TODO|FIXME)       0.3s                          │
│    └─ LLM(claude-sonnet)    11.2s  4521→1823 tokens        │
│                                                             │
│  ◈ review   [default]  ████░░░░░░░░░░░░░   4.1s  $0.012    │
│    └─ LLM(claude-sonnet)     4.1s  processing...           │
│                                                             │
│  ◎ report   [minimal]  ░░░░░░░░░░░░░░░░░  pending          │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│  Tokens: 6,344  │  Cost: $0.036  │  Time: 16.4s            │
└─────────────────────────────────────────────────────────────┘

Implementation Status

Real-time TUI with progress — Implemented
Token and cost tracking — Implemented
Structured trace output — In Design
OpenTelemetry export — In Design
Historical trace storage — Future

Self-Healing Agents All Concepts