01 — THE RELIABILITY PLATFORM FOR AI AGENTS

Recover from every failure.

obsrv catches failures the moment they appear, traces the cause, and tells you exactly what to change.

Book a call

$npx @theta-lab/obsrv setup

Installs the obsrv Codex skill for instrumenting AI platforms.

Live system metrics

updating now

Throughputtrc/min

1,247trc/min

Latencyms p50

92ms p50

Error%

1.4%

Uptime%

99.94%

02 — WHY OBSRV

From failure
to fix. In one platform.

Most observability tools stop at “something is wrong.” obsrv closes the loop — detect the regression, diagnose the cause, suggest the fix, verify it works. Reliability, not just dashboards.

01DETECT
Catches regressions automatically the moment a release ships — failure clusters and traces attached.
02DIAGNOSE
Replay any run end-to-end. Text, images, audio, video, and tool calls inline.
03FIX
Suggests the change for every cluster — prompt edit, tool fix, or guardrail. With the diff.
04VERIFY
Re-run evals against the failing traces before you ship. Confirm the fix actually worked.
05SECURE
Runs in your VPC. Your traces never leave your network. Your perimeter, your keys.

03 — WHAT WE RECORD

Everything obsrv records.

Trace replay

Step-perfect timeline of every run.

Evaluations

Synthetic + observed signals, side by side.

Cluster discovery

Failure modes from real traffic — no labels.

Multimodal

Image, audio, video, sensor — inline in context.

MODEL OBS-1 · S/N 01HXR4Z9CK · REV 04 · 2026

04 — DETECT & FIX

Every alert ships
with the fix.

obsrv catches the regression, traces the cluster, identifies the release that caused it, and tells you exactly what to change. Reliability without the guesswork.

↓ active alerts · last 60 min

ALERT 01FAILincorrect_tool_selection14:23Z
Agent calls cancel_order on the most recent order when user references a previous one.
RELEASE v3.2.1·task_adherence −12%
SUGGESTED FIX
Require explicit order_id confirmation before cancel_order — patch tool schema with a required disambiguation step.
ALERT 02WARNgoal_drift14:18Z
Multi-turn conversations drift into unrelated billing topics after step 7.
RELEASE v3.2.1·context_relevance −4%
SUGGESTED FIX
Add a goal-recap turn every 5 steps; anchor system prompt with the original task before each tool call.
ALERT 03FAILfabricated_field14:11Z
Agent invents tracking_number values when shipping_status returns null.
RELEASE v3.2.0·csat −0.6
SUGGESTED FIX
Return a graceful fallback message when shipping_status is null instead of free-generating — gate the field behind a null check.

05 — EVALUATIONS

Measure what actually matters.

obsrv pairs synthetic evaluators with the real signals your users send back — refunds, escalations, thumbs-down, items purchased.

Synthetic + observed signals
LLM-as-judge metrics side by side with the events your users actually trigger.
Per-release scoring
Compare regressions across prompt versions and releases in one click.
Stable metric API
Record outcomes from any language — same pipeline as traces.

EVAL PANEL·last 24h · prod

REL v3.2.1

Synthetic

Unsupported requestFAIL

User frustrationFAIL

Context retrievalPASS

Response coherencePASS

Tool selectionPASS

Observed

Refund requestednow

Plan upgradednow

Order placed1m

Incorrect info reported12s

Conversation reopened2m

ADHERENCE

78%

P50 LATENCY

92ms

ERR RATE

1.4%

06 — CLUSTERS

Failure patterns, found automatically.

No predefined categories. No manual tagging. obsrv groups every trace by behaviour as it lands — so you see the failure patterns before you know to ask about them.

Continuous embedding
Every trace embedded as it lands; clusters update as your traffic shifts.
Auto-labelled
Cluster names generated from representative behaviour, not template guesses.
Drill straight to traces
Jump from any cluster to the underlying runs and replay them in context.

FAILURE CLUSTERS·440 traces · 7 clusters

last 24h · prod

CLUSTERS
incorrect_tool_selection127
goal_drift84
fabricated_field62
shipping_inquiry89
unsupported_request51
infinite_loop9
checkout_question38

07 — SECURITY

Your data never
leaves your network.

obsrv runs inside your environment — your VPC, on-prem, or air-gapped. Traces, prompts, and outputs stay where they are. We never see them.

Talk to engineeringSOC2 Type II · in process

PERIMETER · SEALED

customer-vpc · primary

S/01
Zero data egress
Traces, prompts, outputs, and tool I/O stay inside your network. obsrv runs in your VPC.
S/02
No subprocessors
No third-party storage, no managed search vendor, no external embedding API.
S/03
Inherits your perimeter
Your IAM, your KMS keys, your audit logs. obsrv plugs in — it doesn't replace.
S/04
No vendor backdoor
thetalab has no read access to your traces. Updates ship as signed images.
S/05
Air-gap supported
Sovereign and regulated workloads run fully offline with mirrored update channels.

08 — DEPLOY

Ship the agent.
obsrv will record everything.

Drop the SDK in three lines. The recorder lights up the moment traffic starts flowing.

Book a call Sign in

$npx @theta-lab/obsrv setup

FDR · OBS-1

REC
tr_01HXR4Z9CK
support_agent · 4.2s
✗ wrong_order

S/N 01HXR4Z9CK

Recover from every failure.

obsrv catches failures the moment they appear, traces the cause, and tells you exactly what to change.

From failureto fix. In one platform.