Execution Evidence
Proof over positioning.
decide pilots are judged by operational outcomes, not demo quality. Every pilot is scoped to one support queue and instrumented with request-level traceability.
Reference implementation (Decision API runtime): Krafthaus
Live pilot telemetry
Production runtime snapshot from metrics. Use this panel for weekly sprint readouts and YC screenshots.
Live counters start after first routed pilot traffic.
| Top Event (24h) | Count |
|---|---|
| Loading runtime telemetry... | |
Loading live metrics...
Pilot scorecard template
Use this structure for every pilot. Keep baseline, target, and current values visible in one place for weekly readouts.
request_id and rationale trace.What gets measured in every pilot
| Metric | Definition | Collection Method | Cadence |
|---|---|---|---|
| Escalation rate | Percent of policy tickets escalated from frontline queue. | Helpdesk tags + decision run logs joined by workflow ID. | Weekly |
| Handle-time variance | Spread in resolution time for same policy class. | Ticket timestamps grouped by policy intent and verdict. | Weekly |
| Dispute readiness | Time to assemble evidence for chargeback/dispute cases. | request_id + deterministic rationale retrieval from audit trail. |
Per incident + monthly summary |
Verification model
Every run logs deterministic verdict fields, payload hash, response hash, latency, and request_id. Pilot reviews use this data to compare manual policy handling vs. routed policy execution.
Weekly pilot readout structure
| Week | Decision scope | Required output |
|---|---|---|
| Week 0 | Baseline extraction | Queue definition, metric formulas, and baseline values frozen. |
| Week 1-2 | Routed decision rollout | Daily run logs with request_id mapping to tickets. |
| Week 3-4 | Outcome comparison | Before/after scorecard with variance analysis and exception list. |
Minimum evidence packet
Use this package for customer sign-off, internal reviews, and fundraising diligence.
request_id.