01 — THE RELIABILITY PLATFORM FOR AI AGENTS

Recover from every failure.

obsrv catches failures the moment they appear, traces the cause, and tells you exactly what to change.

Live system metrics
Throughputtrc/min
02k
1,247
Latencyms p50
0500
92
Error%
010
1.4
Uptime%
0100
99.94
02 — WHY OBSRV

From failure
to fix. In one platform.

Most observability tools stop at “something is wrong.” obsrv closes the loop — detect the regression, diagnose the cause, suggest the fix, verify it works. Reliability, not just dashboards.

  • 01DETECT
    Catches regressions automatically the moment a release ships — failure clusters and traces attached.
  • 02DIAGNOSE
    Replay any run end-to-end. Text, images, audio, video, and tool calls inline.
  • 03FIX
    Suggests the change for every cluster — prompt edit, tool fix, or guardrail. With the diff.
  • 04VERIFY
    Re-run evals against the failing traces before you ship. Confirm the fix actually worked.
  • 05SECURE
    Runs in your VPC. Your traces never leave your network. Your perimeter, your keys.
03 — WHAT WE RECORD

Everything obsrv records.

A
Trace replay
Step-perfect timeline of every run.
B
Evaluations
Synthetic + observed signals, side by side.
C
Cluster discovery
Failure modes from real traffic — no labels.
D
Multimodal
Image, audio, video, sensor — inline in context.
MODEL OBS-1 · S/N 01HXR4Z9CK · REV 04 · 2026
FLIGHT DATA RECORDERPWRRECCLUMODEL OBS-1 · S/N 01HXR4Z9CKDO NOT OPEN · INSPECT VIA OBSRVATrace replayStep-perfect timeline of every run.BEvaluationsSynthetic + observed signals.CCluster discoveryFailure modes, surfaced from real traffic.DMultimodal evidenceImage, audio, video, sensor — inline.DRAWINGobsrv black box · OBS-1REV04 · 2026
04 — DETECT & FIX

Every alert ships
with the fix.

obsrv catches the regression, traces the cluster, identifies the release that caused it, and tells you exactly what to change. Reliability without the guesswork.

↓ active alerts · last 60 min
  • ALERT 01FAILincorrect_tool_selection14:23Z
    Agent calls cancel_order on the most recent order when user references a previous one.
    RELEASE v3.2.1·task_adherence −12%
    SUGGESTED FIX
    Require explicit order_id confirmation before cancel_order — patch tool schema with a required disambiguation step.
  • ALERT 02WARNgoal_drift14:18Z
    Multi-turn conversations drift into unrelated billing topics after step 7.
    RELEASE v3.2.1·context_relevance −4%
    SUGGESTED FIX
    Add a goal-recap turn every 5 steps; anchor system prompt with the original task before each tool call.
  • ALERT 03FAILfabricated_field14:11Z
    Agent invents tracking_number values when shipping_status returns null.
    RELEASE v3.2.0·csat −0.6
    SUGGESTED FIX
    Return a graceful fallback message when shipping_status is null instead of free-generating — gate the field behind a null check.
05 — EVALUATIONS

Measure what actually matters.

obsrv pairs synthetic evaluators with the real signals your users send back — refunds, escalations, thumbs-down, items purchased.

  • Synthetic + observed signals
    LLM-as-judge metrics side by side with the events your users actually trigger.
  • Per-release scoring
    Compare regressions across prompt versions and releases in one click.
  • Stable metric API
    Record outcomes from any language — same pipeline as traces.
EVAL PANEL·last 24h · prod
REL v3.2.1
Synthetic
Unsupported requestFAIL
User frustrationFAIL
Context retrievalPASS
Response coherencePASS
Tool selectionPASS
Observed
Refund requestednow
22
Plan upgradednow
64
Order placed1m
71
Incorrect info reported12s
19
Conversation reopened2m
41
ADHERENCE
78%
P50 LATENCY
92ms
ERR RATE
1.4%
06 — CLUSTERS

Failure patterns, found automatically.

No predefined categories. No manual tagging. obsrv groups every trace by behaviour as it lands — so you see the failure patterns before you know to ask about them.

  • Continuous embedding
    Every trace embedded as it lands; clusters update as your traffic shifts.
  • Auto-labelled
    Cluster names generated from representative behaviour, not template guesses.
  • Drill straight to traces
    Jump from any cluster to the underlying runs and replay them in context.
FAILURE CLUSTERS·440 traces · 7 clusters
last 24h · prod
  • CLUSTERS
  • incorrect_tool_selection127
  • goal_drift84
  • fabricated_field62
  • shipping_inquiry89
  • unsupported_request51
  • infinite_loop9
  • checkout_question38
07 — SECURITY

Your data never
leaves your network.

obsrv runs inside your environment — your VPC, on-prem, or air-gapped. Traces, prompts, and outputs stay where they are. We never see them.

Talk to engineeringSOC2 Type II · in process
PERIMETER · SEALED
customer-vpc · primary
  • S/01
    Zero data egress
    Traces, prompts, outputs, and tool I/O stay inside your network. obsrv runs in your VPC.
  • S/02
    No subprocessors
    No third-party storage, no managed search vendor, no external embedding API.
  • S/03
    Inherits your perimeter
    Your IAM, your KMS keys, your audit logs. obsrv plugs in — it doesn't replace.
  • S/04
    No vendor backdoor
    thetalab has no read access to your traces. Updates ship as signed images.
  • S/05
    Air-gap supported
    Sovereign and regulated workloads run fully offline with mirrored update channels.
08 — DEPLOY

Ship the agent.
obsrv will record everything.

Drop the SDK in three lines. The recorder lights up the moment traffic starts flowing.

$pip install theta-obsrv·npm i @theta-lab/obsrv
FDR · OBS-1
REC
tr_01HXR4Z9CK
support_agent · 4.2s
✗ wrong_order