Validation Lab Walk-Forward Governance Gate

New Validation Run

Configure a policy-driven walk-forward validation for challenger versus baseline governance, holdout protection, and promotion readiness.

Dashboard Experiments Models Validation Leaderboard

Validation Definition

Define the run identity and the primary metric that governs recommendation logic.

Run Identity Primary Metric

Validation Name

Primary Metric

Description

Validation Intent

This run will evaluate challenger performance against the selected primary metric and then apply governance thresholds before any promotion recommendation.

Model Comparison

Select the production baseline and the challenger model that will face governance review.

Baseline vs Challenger Traceable

Baseline Model

Challenger Model

Source Experiment Id

Optional traceability link to the experiment that produced the challenger.

Comparison Kind

Comparison Reminder

Validation should compare a stable production baseline against a challenger hypothesis. Avoid running governance on two experimental models at the same time.

Window Strategy

Define the rolling evaluation cadence, holdout structure, and validation time horizon.

RollingWithHoldout Holdout Aware

Window Count

Window Size Days

Holdout Size Days

Gap Days

Start Date UTC

End Date UTC

Window Strategy

Windowing Guidance

Use enough windows to establish repeated challenger behavior, and reserve holdout space to detect late overfitting or fragile gains.

Evidence Thresholds

Set minimum evidence requirements for evaluation windows and total sample sufficiency.

Sample Controls Sufficiency

Minimum Games / Window

Minimum Total Games

Minimum Player Rows

Evidence Standard

These thresholds prevent low-sample validations from generating misleading promotion recommendations.

Promotion Policy Engine

Persist the exact decision policy snapshot that will govern recommendation outcomes.

Policy-Versioned Replayable

Promotion Gate Config Json

Defines comparable-window requirements, improvement thresholds, holdout protection, and recommendation gate logic.

Secondary Metrics Json

Optional metadata for secondary metric review, guardrails, and future composite governance logic.

Policy Example

{
  "minimumComparableEvaluationWindows": 2,
  "primaryMetricImprovementThreshold": 0.05,
  "holdoutRegressionTolerance": 0.05,
  "minimumHoldoutGames": 20,
  "minimumHoldoutPlayerRows": 300,
  "requireHoldoutComparable": true,
  "requireHoldoutWin": false,
  "maxPointsMaeRegression": 0.15,
  "maxMinutesMaeRegression": 0.50,
  "maxFairSpreadMaeRegression": 0.25,
  "maxFairTotalMaeRegression": 0.25
}

Notes

Internal context, reviewer comments, or governance annotations for this validation run.

Audit Trail

Execution

Action

Create the validation structure first, or create and execute immediately using the current policy snapshot.

Recommended Practice

Use Create Validation Run when you want a final review of policy and ranges. Use Create and Run only when the governance inputs are already final.

Coverage Advisor

Data-Aware

Coverage Start

2025-10-21

Coverage End

2026-05-03

Eligible Games

1282

Requested Range Games

673

Coverage Summary

Eligible final-game coverage runs from 2025-10-21 to 2026-05-03. Requested range currently contains 673 eligible games.

Governance Snapshot

Immutable

Deterministic Replay Immutable Validation Results Policy-Based Recommendation Holdout Protection Promotion Auditability

Expected Flow

Lifecycle

1. Baseline vs Challenger

Establish the governed comparison pair.

2. Rolling Evaluation Windows

Compare repeated historical windows using the selected policy.

3. Final Holdout Window

Check whether challenger gains remain stable outside the evaluation set.

4. Stored Immutable Results

Persist audit-ready validation evidence.

5. Promotion Review Ready

Hand the run into governance with recommendation context.