Execution Evidence

Proof over positioning.

decide pilots are judged by operational outcomes, not demo quality. Every pilot is scoped to one support queue and instrumented with request-level traceability.

Open live demo Start sprint intake View metrics JSON

Live pilot telemetry

Core runtime snapshot from metrics. Use this panel for weekly sprint readouts and YC screenshots.

Loading
Pilot intakes (24h)
-
Inbound pilot requests captured in the last 24 hours.
Delivery success (24h)
-
Requests delivered to primary or backup intake destination.
Delivery failures (24h)
-
Failures that should trigger follow-up and alert review.
Decision API signals (24h)
-
Events tagged with decide/decision for product activity checks.
MCP signals (24h)
-
Events tagged with MCP or policy tool names.
Total tracked events (24h)
-
All runtime events during the latest 24-hour window.
Top Event (24h) Count
Loading runtime telemetry...

Loading live metrics...

Pilot scorecard template

Use this structure for every pilot. Keep baseline, target, and current values visible in one place for weekly readouts.

Escalation Rate
Baseline / Target / Current
Formula: escalated policy tickets ÷ policy tickets handled.
Track by queue and ticket type (refund/cancel/trial).
Handle-Time Variance
Baseline / Target / Current
Formula: spread between p90 and p50 handling time.
Measure only comparable policy intents.
Dispute Readiness
Baseline / Target / Current
Formula: minutes to produce dispute evidence pack.
Evidence pack must include request_id and rationale trace.

What gets measured in every pilot

Metric Definition Collection Method Cadence
Escalation rate Percent of policy tickets escalated from frontline queue. Helpdesk tags + decision run logs joined by workflow ID. Weekly
Handle-time variance Spread in resolution time for same policy class. Ticket timestamps grouped by policy intent and verdict. Weekly
Dispute readiness Time to assemble evidence for chargeback/dispute cases. request_id + deterministic rationale retrieval from audit trail. Per incident + monthly summary

Verification model

Every run logs deterministic verdict fields, payload hash, response hash, latency, and request_id. Pilot reviews use this data to compare manual policy handling vs. routed policy execution.

Weekly pilot readout structure

Week Decision scope Required output
Week 0 Baseline extraction Queue definition, metric formulas, and baseline values frozen.
Week 1-2 Routed decision rollout Daily run logs with request_id mapping to tickets.
Week 3-4 Outcome comparison Before/after scorecard with variance analysis and exception list.

Minimum evidence packet

Use this package for customer sign-off, internal reviews, and fundraising diligence.

Run export
CSV export of decision runs including status, latency, and request_id.
Queue metrics extract
Helpdesk metrics snapshot for baseline and current pilot windows.
Replay samples
At least five ticket-level replays that show deterministic output and action mapping.