← Back
Validation Lab Governance Verdict Completed RollingWithHoldout

Validate Active v1 vs Challenger 22

Walk-forward validation audit, holdout protection, and final promotion readiness for challenger governance.
Status
Completed
Current lifecycle state of this validation run.
Primary Metric
PraMae
Governing metric used by the recommendation engine.
Recommendation
NeedsReview
Policy-engine recommendation based on evaluation, holdout, and guardrails.
Window Win Rate
0.0%
Share of comparable windows won by the challenger.

Governance Verdict

Summary of the challenger’s posture after evaluation windows, holdout checks, and promotion policy gates.
Recommendation: NeedsReview
Decision Reason
Comparable evaluation windows: 1. Insufficient evaluation windows: 3. Comparable evaluation games: 144. Comparable player rows: 2875. MinimumGamesPerWindow=10. MinimumTotalGames=40. MinimumPlayerRows=400. Evaluation mean PraMae: baseline=10.532744, challenger=10.609572, delta=0.076828. Holdout PraMae: baseline=9.530819, challenger=9.322832, delta=-0.207987. Challenger won 0 of 1 comparable evaluation windows (0.00%).
Governance Interpretation
A strong recommendation should still be read alongside holdout performance, guardrail posture, and the final human governance decision.

Decision Support

Audit
Validation Standard
Challenger approval should require both target-metric strength and acceptable regression behavior across protected secondary metrics.
Best Practice
Treat this page as the governance verdict surface. Promotion should remain deliberate and supported by both policy outcome and reviewer judgment.
Traceability
This run preserves immutable validation evidence, explicit gates, holdout results, and final-decision history.

Model Comparison

Baseline production model versus challenger candidate under governance review.
BaselineVsChallenger
Baseline
v1
Model 1
Challenger
random-0007-0006
Model 22

Evaluation Evidence

Sample Quality
Comparable Windows
0
Insufficient Windows
4
Comparable Games
0
Comparable Player Rows
0
Min Games / Window
10
Run-Level Sample Gate
Fail
Min Total Games
40
Min Player Rows
400

Run Definition

Config
Comparison Kind
BaselineVsChallenger
Window Strategy
RollingWithHoldout
Window Count
4
Gap Days
3
Window Size Days
21
Holdout Size Days
21
Start
2025-11-10
End
2026-03-09

Evaluation Summary

Main Set
Baseline Mean
10.53
Challenger Mean
10.61
Delta
0.08
Stability Score
Baseline Window Wins
1
Challenger Window Wins
0

Holdout Summary

Final Check
Baseline Holdout
9.53
Challenger Holdout
9.32
Holdout Delta
-0.21
Source Experiment
7
Created
2026-03-09 17:41
Completed
2026-03-09 17:41

Promotion Gates

Core gate outcomes used by the policy engine before guardrails and final governance review.
Policy Engine
Comparable Window Gate
Fail
Games Gate
Fail
Player Rows Gate
Fail
Primary Metric Gate
Fail
Holdout Gate
Fail

Secondary Metric Guardrails

Protected regression checks that prevent narrow gains from degrading adjacent model quality.
Guardrails
Points MAE Guard
Pass
Delta: —
Minutes MAE Guard
Pass
Delta: —
Fair Spread MAE Guard
Pass
Delta: —
Fair Total MAE Guard
Pass
Delta: —

Final Decision Governance

Human governance decision layer on top of advisor recommendation and immutable validation evidence.
Final Review
Advisor Recommendation
NeedsReview
Final Decision
Overridden
No
Decision By
Decision Time
Final Decision Reason
Governance Action
Record the final decision with explicit reasoning. This creates the last decision layer above the policy engine’s recommendation.

Window Results

Comparable and holdout window outcomes across the full walk-forward validation timeline.
RollingWithHoldout 5 window(s)
Window Reading Guide
Use comparable windows to judge repeated challenger strength, and the holdout window to test whether the improvement survives outside the main evaluation set.
# Role Status Range Baseline Challenger Delta Winner Games Player Rows Notes
1 Evaluation Insufficient Data 2025-11-10 → 2025-11-30
B: 0
C: 0
B: 0
C: 0
2 Evaluation Insufficient Data 2025-12-01 → 2025-12-21
B: 0
C: 0
B: 0
C: 0
3 Evaluation Insufficient Data 2025-12-22 → 2026-01-11
B: 0
C: 0
B: 0
C: 0
4 Evaluation Insufficient Data 2026-01-12 → 2026-02-01 10.53 10.61 0.08
B: 144
C: 144
B: 2875
C: 2875
5 Holdout Holdout / Insufficient 2026-02-05 → 2026-02-25 9.53 9.32 -0.21
B: 115
C: 115
B: 2295
C: 2295