← Back
Validation Lab Walk-Forward Governance Gate

New Validation Run

Configure a policy-driven walk-forward validation for challenger versus baseline governance, holdout protection, and promotion readiness.

Validation Definition

Define the run identity and the primary metric that governs recommendation logic.
Run Identity Primary Metric
Validation Intent
This run will evaluate challenger performance against the selected primary metric and then apply governance thresholds before any promotion recommendation.

Model Comparison

Select the production baseline and the challenger model that will face governance review.
Baseline vs Challenger Traceable
Optional traceability link to the experiment that produced the challenger.
Comparison Reminder
Validation should compare a stable production baseline against a challenger hypothesis. Avoid running governance on two experimental models at the same time.

Window Strategy

Define the rolling evaluation cadence, holdout structure, and validation time horizon.
RollingWithHoldout Holdout Aware
Windowing Guidance
Use enough windows to establish repeated challenger behavior, and reserve holdout space to detect late overfitting or fragile gains.

Evidence Thresholds

Set minimum evidence requirements for evaluation windows and total sample sufficiency.
Sample Controls Sufficiency
Evidence Standard
These thresholds prevent low-sample validations from generating misleading promotion recommendations.

Promotion Policy Engine

Persist the exact decision policy snapshot that will govern recommendation outcomes.
Policy-Versioned Replayable
Defines comparable-window requirements, improvement thresholds, holdout protection, and recommendation gate logic.
Optional metadata for secondary metric review, guardrails, and future composite governance logic.
Policy Example
{
  "minimumComparableEvaluationWindows": 2,
  "primaryMetricImprovementThreshold": 0.05,
  "holdoutRegressionTolerance": 0.05,
  "minimumHoldoutGames": 20,
  "minimumHoldoutPlayerRows": 300,
  "requireHoldoutComparable": true,
  "requireHoldoutWin": false,
  "maxPointsMaeRegression": 0.15,
  "maxMinutesMaeRegression": 0.50,
  "maxFairSpreadMaeRegression": 0.25,
  "maxFairTotalMaeRegression": 0.25
}

Notes

Internal context, reviewer comments, or governance annotations for this validation run.
Audit Trail

Execution

Action
Create the validation structure first, or create and execute immediately using the current policy snapshot.
Recommended Practice
Use Create Validation Run when you want a final review of policy and ranges. Use Create and Run only when the governance inputs are already final.

Coverage Advisor

Data-Aware
Coverage Start
2025-10-21
Coverage End
2026-04-29
Eligible Games
1273
Requested Range Games
693
Coverage Summary
Eligible final-game coverage runs from 2025-10-21 to 2026-04-29. Requested range currently contains 693 eligible games.

Governance Snapshot

Immutable
Deterministic Replay Immutable Validation Results Policy-Based Recommendation Holdout Protection Promotion Auditability

Expected Flow

Lifecycle
1. Baseline vs Challenger
Establish the governed comparison pair.
2. Rolling Evaluation Windows
Compare repeated historical windows using the selected policy.
3. Final Holdout Window
Check whether challenger gains remain stable outside the evaluation set.
4. Stored Immutable Results
Persist audit-ready validation evidence.
5. Promotion Review Ready
Hand the run into governance with recommendation context.