Epistemic Awareness: AI That Knows What It Doesn't Know | Nika Research

The Problem We're Solving

Large Language Models hallucinate. They make confident-sounding claims about things that aren't true. In a simple chatbot, this is annoying. In an agentic workflow that's making decisions, writing code, or calling APIs, it can be dangerous.

Current approaches to this problem mostly focus on the model itself — fine-tuning, RLHF, or prompting techniques. But we believe there's a complementary approach that's been underexplored: runtime observation.

Our Approach: Runtime Epistemics

Instead of asking the model "are you sure?", we observe what's happening during execution and compute objective signals about the system's epistemic state.

# Philosophy (from our internal docs)
EpistemicAwareness = SHAKA's sensing layer (NOT standalone)

Key Insight:
- We DON'T ask the LLM to self-evaluate (unreliable)
- We DO observe runtime behavior (objective)
- We compute collapse risk from signals (deterministic)

Runtime Signals We Track

healthRetriesNumber of retry attempts for failed operations

healthTool ErrorsErrors from tool invocations

healthTimeoutsOperations exceeding expected duration

healthStallsPeriods of no progress or activity

qualitySchema FailuresOutput not matching expected schema

qualityParse FailuresUnable to parse structured output

qualityRepairs NeededNumber of output corrections

evidenceEvidence CoverageRatio of claims with supporting evidence

Collapse Risk Scoring

These signals feed into a deterministic scoring system that computes a "collapse risk" — the probability that the current execution is heading toward failure.

# Collapse Risk Levels (ScoreBp 0-10000)

Low (0-2500)

Normal operation

Medium

Increased monitoring

High

Mitigation activated

Critical

Early stop possible

Open Questions (We Don't Have All the Answers)

How do we calibrate signal weights across different task types?
Can we train a meta-model to predict collapse risk more accurately?
What's the right balance between false positives and missed failures?
How do epistemic signals differ across model providers (Claude vs GPT vs Gemini)?

Want to explore this with us?

We're looking for collaborators and early testers.

All Concepts Join Waitlist