← Back
Validation Lab
Governance Verdict
Completed
RollingWithHoldout
Campaign Minutes Calibration Sprint 01 Run 5
Walk-forward validation audit, holdout protection, and final promotion readiness for challenger governance.
Status
Completed
Current lifecycle state of this validation run.
Primary Metric
MinutesMae
Governing metric used by the recommendation engine.
Recommendation
Reject
Policy-engine recommendation based on evaluation, holdout, and guardrails.
Window Win Rate
0.0%
Share of comparable windows won by the challenger.
Governance Verdict
Summary of the challenger’s posture after evaluation windows, holdout checks, and promotion policy gates.
Recommendation: Reject
Final: Reject
Decision Reason
Comparable evaluation windows=2 (required>=2). Comparable evaluation games=168 (required>=50). Comparable evaluation player rows=3354 (required>=300). Holdout comparable=True (required=True). Holdout games=112 (required>=20). Holdout player rows=2239 (required>=120). RequireHoldoutWin=True. PrimaryMetricImprovementThreshold=0.05. HoldoutRegressionTolerance=0.03. Evaluation delta=1.729747. Holdout delta=0.32548. PointsMae delta=1.907037 (max regression=0.15). MinutesMae delta=1.729747 (max regression=0). FairSpreadMae delta=0.135613 (max regression=n/a). FairTotalMae delta=0.013534 (max regression=n/a). Failed holdout win gate. Failed holdout regression tolerance gate. Failed primary metric improvement gate. Failed PointsMae guard. Failed MinutesMae guard. Recommendation=Reject because challenger failed performance or guardrail gates.
Governance Interpretation
A strong recommendation should still be read alongside holdout performance, guardrail posture,
and the final human governance decision.
Decision Support
AuditValidation Standard
Challenger approval should require both target-metric strength and acceptable regression behavior across protected secondary metrics.
Best Practice
Treat this page as the governance verdict surface. Promotion should remain deliberate and supported by both policy outcome and reviewer judgment.
Traceability
This run preserves immutable validation evidence, explicit gates, holdout results, and final-decision history.
Model Comparison
Baseline production model versus challenger candidate under governance review.
BaselineVsChallenger
Baseline
structured-0010-0003
Model 45
Challenger
structured-0014-0003
Model 75
Evaluation Evidence
Sample QualityComparable Windows
2
Insufficient Windows
0
Comparable Games
168
Comparable Player Rows
3354
Min Games / Window
20
Run-Level Sample Gate
Pass
Min Total Games
50
Min Player Rows
300
Run Definition
ConfigComparison Kind
BaselineVsChallenger
Window Strategy
RollingWithHoldout
Window Count
2
Gap Days
3
Window Size Days
14
Holdout Size Days
14
Start
2026-02-07
End
2026-03-23
Evaluation Summary
Main SetBaseline Mean
9.88
Challenger Mean
11.61
Delta
1.73
Stability Score
0.02
Baseline Window Wins
2
Challenger Window Wins
0
Holdout Summary
Final CheckBaseline Holdout
8.91
Challenger Holdout
9.23
Holdout Delta
0.33
Source Experiment
14
Created
2026-03-24 11:36
Completed
2026-03-24 11:37
Promotion Gates
Core gate outcomes used by the policy engine before guardrails and final governance review.
Policy Engine
Comparable Window Gate
Pass
Games Gate
Pass
Player Rows Gate
Pass
Primary Metric Gate
Fail
Holdout Gate
Fail
Secondary Metric Guardrails
Protected regression checks that prevent narrow gains from degrading adjacent model quality.
Guardrails
Points MAE Guard
Fail
Delta: 1.91
Minutes MAE Guard
Fail
Delta: 1.73
Fair Spread MAE Guard
Pass
Delta: 0.14
Fair Total MAE Guard
Pass
Delta: 0.01
Final Decision Governance
Human governance decision layer on top of advisor recommendation and immutable validation evidence.
Final Review
Advisor Recommendation
Reject
Final Decision
Reject
Overridden
No
Decision By
automation-dashboard
Decision Time
2026-03-24 11:37
Final Decision Reason
Auto-applied by automation workflow
Governance Action
Record the final decision with explicit reasoning. This creates the last decision layer above the policy engine’s recommendation.
Window Results
Comparable and holdout window outcomes across the full walk-forward validation timeline.
RollingWithHoldout
3 window(s)
Window Reading Guide
Use comparable windows to judge repeated challenger strength, and the holdout window to test whether the improvement survives outside the main evaluation set.
| # | Role | Status | Range | Baseline | Challenger | Delta | Winner | Games | Player Rows | Notes |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Evaluation | Comparable | 2026-02-07 → 2026-02-20 | 10.49 | 12.20 | 1.71 | Baseline |
B: 64
C: 64
|
B: 1276
C: 1279
|
— |
| 2 | Evaluation | Comparable | 2026-02-21 → 2026-03-06 | 9.26 | 11.02 | 1.75 | Baseline |
B: 104
C: 104
|
B: 2078
C: 2078
|
— |
| 3 | Holdout | Holdout | 2026-03-10 → 2026-03-23 | 8.91 | 9.23 | 0.33 | Baseline |
B: 112
C: 112
|
B: 2239
C: 2239
|
— |
Notes
ContextDescription
Automated campaign validation
Internal Notes
Final decision set to Reject by automation-dashboard at 2026-03-24T18:37:57.6008703Z. Reason: Auto-applied by automation workflow