Execution Evidence

Proof over positioning.

decide pilots are judged by operational outcomes, not demo quality. Every pilot is scoped to one support queue and instrumented with request-level traceability.

Open live demo Start sprint intake View metrics JSON

Live pilot telemetry

Core runtime snapshot from metrics. Use this panel for weekly sprint readouts and YC screenshots.

Pilot intakes (24h)

Inbound pilot requests captured in the last 24 hours.

Delivery success (24h)

Requests delivered to primary or backup intake destination.

Delivery failures (24h)

Failures that should trigger follow-up and alert review.

Decision API signals (24h)

Events tagged with decide/decision for product activity checks.

MCP signals (24h)

Events tagged with MCP or policy tool names.

Total tracked events (24h)

All runtime events during the latest 24-hour window.

Top Event (24h)	Count
Loading runtime telemetry...

Loading live metrics...

Pilot scorecard template

Use this structure for every pilot. Keep baseline, target, and current values visible in one place for weekly readouts.

Escalation Rate

Baseline / Target / Current

Formula: escalated policy tickets ÷ policy tickets handled.

Track by queue and ticket type (refund/cancel/trial).

Handle-Time Variance

Baseline / Target / Current

Formula: spread between p90 and p50 handling time.

Measure only comparable policy intents.

Dispute Readiness

Baseline / Target / Current

Formula: minutes to produce dispute evidence pack.

Evidence pack must include request_id and rationale trace.

What gets measured in every pilot

Metric	Definition	Collection Method	Cadence
Escalation rate	Percent of policy tickets escalated from frontline queue.	Helpdesk tags + decision run logs joined by workflow ID.	Weekly
Handle-time variance	Spread in resolution time for same policy class.	Ticket timestamps grouped by policy intent and verdict.	Weekly
Dispute readiness	Time to assemble evidence for chargeback/dispute cases.	`request_id` + deterministic rationale retrieval from audit trail.	Per incident + monthly summary

Verification model

Every run logs deterministic verdict fields, payload hash, response hash, latency, and request_id. Pilot reviews use this data to compare manual policy handling vs. routed policy execution.

Weekly pilot readout structure

Week	Decision scope	Required output
Week 0	Baseline extraction	Queue definition, metric formulas, and baseline values frozen.
Week 1-2	Routed decision rollout	Daily run logs with `request_id` mapping to tickets.
Week 3-4	Outcome comparison	Before/after scorecard with variance analysis and exception list.

Minimum evidence packet

Use this package for customer sign-off, internal reviews, and fundraising diligence.

Run export

CSV export of decision runs including status, latency, and request_id.

Queue metrics extract

Helpdesk metrics snapshot for baseline and current pilot windows.

Replay samples

At least five ticket-level replays that show deterministic output and action mapping.