← Back
Validation Lab Governance Verdict Running RollingWithHoldout

Validate Active v1 vs Challenger 22

Walk-forward validation audit, holdout protection, and final promotion readiness for challenger governance.
Status
Running
Current lifecycle state of this validation run.
Primary Metric
PraMae
Governing metric used by the recommendation engine.
Recommendation
NeedsReview
Policy-engine recommendation based on evaluation, holdout, and guardrails.
Window Win Rate
Share of comparable windows won by the challenger.

Governance Verdict

Summary of the challenger’s posture after evaluation windows, holdout checks, and promotion policy gates.
Recommendation: NeedsReview
Decision Reason
No promotion decision summary generated yet.
Governance Interpretation
A strong recommendation should still be read alongside holdout performance, guardrail posture, and the final human governance decision.

Decision Support

Audit
Validation Standard
Challenger approval should require both target-metric strength and acceptable regression behavior across protected secondary metrics.
Best Practice
Treat this page as the governance verdict surface. Promotion should remain deliberate and supported by both policy outcome and reviewer judgment.
Traceability
This run preserves immutable validation evidence, explicit gates, holdout results, and final-decision history.

Model Comparison

Baseline production model versus challenger candidate under governance review.
BaselineVsChallenger
Baseline
v1
Model 1
Challenger
random-0007-0006
Model 22

Evaluation Evidence

Sample Quality
Comparable Windows
0
Insufficient Windows
4
Comparable Games
0
Comparable Player Rows
0
Min Games / Window
10
Run-Level Sample Gate
Fail
Min Total Games
40
Min Player Rows
400

Run Definition

Config
Comparison Kind
BaselineVsChallenger
Window Strategy
RollingWithHoldout
Window Count
4
Gap Days
3
Window Size Days
21
Holdout Size Days
21
Start
2025-11-09
End
2026-03-08

Evaluation Summary

Main Set
Baseline Mean
Challenger Mean
Delta
Stability Score
Baseline Window Wins
0
Challenger Window Wins
0

Holdout Summary

Final Check
Baseline Holdout
Challenger Holdout
Holdout Delta
Source Experiment
7
Created
2026-03-09 16:58
Completed

Promotion Gates

Core gate outcomes used by the policy engine before guardrails and final governance review.
Policy Engine
Comparable Window Gate
Fail
Games Gate
Fail
Player Rows Gate
Fail
Primary Metric Gate
Fail
Holdout Gate
Fail

Secondary Metric Guardrails

Protected regression checks that prevent narrow gains from degrading adjacent model quality.
Guardrails
Points MAE Guard
Pass
Delta: —
Minutes MAE Guard
Pass
Delta: —
Fair Spread MAE Guard
Pass
Delta: —
Fair Total MAE Guard
Pass
Delta: —

Final Decision Governance

Human governance decision layer on top of advisor recommendation and immutable validation evidence.
Final Review
Advisor Recommendation
NeedsReview
Final Decision
Overridden
No
Decision By
Decision Time
Final Decision Reason

Window Results

Comparable and holdout window outcomes across the full walk-forward validation timeline.
RollingWithHoldout 5 window(s)
Window Reading Guide
Use comparable windows to judge repeated challenger strength, and the holdout window to test whether the improvement survives outside the main evaluation set.
# Role Status Range Baseline Challenger Delta Winner Games Player Rows Notes
1 Evaluation Insufficient Data 2025-11-09 → 2025-11-29
B: 0
C: 0
B: 0
C: 0
2 Evaluation Insufficient Data 2025-11-30 → 2025-12-20
B: 0
C: 0
B: 0
C: 0
3 Evaluation Insufficient Data 2025-12-21 → 2026-01-10
B: 0
C: 0
B: 0
C: 0
4 Evaluation Insufficient Data 2026-01-11 → 2026-01-31
B: —
C: —
B: —
C: —
5 Holdout Holdout / Insufficient 2026-02-04 → 2026-02-24
B: —
C: —
B: —
C: —