solver.press

Performative scenario optimization solutions converge to classical stochastic programming solutions as the strength of the decision-feedback effect decreases, with convergence rate bounded by the Lipschitz modulus of the distribution map.

MathematicsApr 1, 2026Evaluation Score: 70%

Adversarial Debate Score

70% survival rate under critique

Model Critiques

anthropic: The hypothesis is mathematically precise and falsifiable, and the core claim about convergence to classical stochastic programming as feedback weakens is conceptually well-grounded in the performative optimization paper; however, the specific claim about convergence *rate* being bounded by the Li...
google: The hypothesis is highly falsifiable and mathematically grounded, as it proposes
grok: Falsifiable via theoretical analysis or numerical tests; aligns with performative optimization framework contrasting feedback effects, with Lipschitz bounds plausible from related multifunction stability papers. Lacks direct proof in excerpts; irrelevant papers weaken support.

Supporting Research Papers

Formal Verification

Z3 logical consistency:⚠️ Unverified

Z3 checks whether the hypothesis is internally consistent, not whether it is empirically true.

Experimental Validation Package

This discovery has a Claude-generated validation package with a full experimental design.

Precise Hypothesis

For a performative scenario optimization problem parameterized by a decision-feedback strength ε ≥ 0, let x*(ε) denote the performatively stable solution and x*(0) the classical stochastic programming (SP) solution. The hypothesis asserts: ||x*(ε) - x*(0)|| ≤ L·ε for all ε in [0, ε_max], where L is the Lipschitz modulus of the distribution map D: X → P(Z) (mapping decisions to probability distributions over outcomes Z), and convergence is measured in a norm consistent with the decision space X. Formally, lim_{ε→0} x*(ε) = x*(0) with rate O(ε·L).

Disproof criteria:
  1. HARD DISPROOF: Empirical measurement of ||x*(ε) - x*(0)|| > C·ε for some constant C > L + δ (δ = 0.01) across ≥ 3 independent problem instances, with statistical significance p < 0.01.
  2. HARD DISPROOF: Existence of a single problem instance where ||x*(ε) - x*(0)||/ε diverges as ε → 0 (super-linear divergence confirmed via log-log regression slope > 1.1 with R² > 0.95).
  3. HARD DISPROOF: Convergence rate empirically measured as O(ε^α) with α < 0.9 (sub-linear in ε) across multiple problem classes.
  4. SOFT DISPROOF: The Lipschitz bound L is not computable or is infinite for standard problem classes (e.g., Gaussian mixture distribution maps), making the bound vacuous.
  5. SOFT DISPROOF: Convergence holds only for a measure-zero set of distribution maps, indicating the result is not generic.
  6. PARTIAL DISPROOF: Convergence rate is correct (O(ε)) but the constant L is not tight — empirical constant exceeds L by factor > 10, making the bound practically useless.

Experimental Protocol

PHASE 1 — Synthetic Validation (Days 1–15): Construct 5 canonical problem classes with analytically known L and x*(0): (A) Linear-quadratic: f(x,z) = (x-z)², D(x;ε) = N(εx, 1) (B) Portfolio optimization: f = -μᵀx + λxᵀΣx, D shifts mean by εAx (C) Newsvendor: f = c·max(d-x,0) + h·max(x-d,0), D(x;ε) shifts demand mean by εx (D) Logistic regression loss with performative labels (E) Convex quadratic with affine distribution shift

For each class, compute x*(ε) numerically for ε ∈ {0, 0.01, 0.05, 0.1, 0.2, 0.5} using repeated stochastic gradient descent to performative stability. Measure ||x*(ε) - x*(0)|| and fit log-log regression to estimate convergence rate α and constant C.

PHASE 2 — Lipschitz Bound Tightness (Days 16–25): Compute L analytically for each problem class. Compare empirical constant C to theoretical L. Assess tightness ratio C/L. Test whether C ≤ L holds universally.

PHASE 3 — Stress Testing (Days 26–35): Test boundary conditions: non-convex f, non-Lipschitz D, near-degenerate ε_max. Document failure modes.

Required datasets:
  1. SYNTHETIC — Linear-quadratic performative optimization instances: Generated programmatically; d ∈ {10, 100, 1000} dimensions; 500 instances per dimension; no external source required.
  2. SYNTHETIC — Portfolio optimization: 50-asset covariance matrices from historical S&P 500 data (2010–2023); source: Yahoo Finance API or CRSP; N=252 scenarios per year.
  3. SYNTHETIC — Newsvendor demand distributions: Exponential, Gamma, and log-normal with ε-shifted means; 1000 Monte Carlo samples per ε value.
  4. BENCHMARK — Performative prediction datasets from Perdomo et al. (2020) replication package (credit scoring with strategic agents); publicly available at github.com/google-research/performative-prediction.
  5. VALIDATION — Stochastic programming benchmark library (SIPLIB): 20 standard SP instances adapted with artificial distribution maps; available at www.isye.gatech.edu/~sahmed/siplib/.
  6. COMPUTATIONAL — Scenario trees with N ∈ {10, 50, 100, 500, 1000} scenarios for convergence-in-N analysis.
Success:
  1. PRIMARY: Log-log regression slope α ∈ [0.9, 1.1] for ≥ 4 of 5 problem classes (p < 0.05 for α=1 not rejected).
  2. PRIMARY: Empirical constant C ≤ 1.5·L for ≥ 3 of 5 problem classes (bound is tight within factor 1.5).
  3. PRIMARY: R² ≥ 0.95 for log-log regression in ≥ 4 of 5 problem classes (confirming power-law relationship).
  4. SECONDARY: Convergence holds for all ε ∈ [0, 0.8·ε_max] without numerical instability.
  5. SECONDARY: Variance across 20 seeds is < 10% of mean ||x*(ε)-x*(0)|| for all ε ≥ 0.01.
  6. SECONDARY: Results replicate for d ∈ {10, 100, 1000} with dimension-independent convergence rate α.
  7. TERTIARY: Bound remains valid (C ≤ 10·L) even for non-convex f in ≥ 2 of 3 stress-test instances.
Failure:
  1. HARD FAILURE: α < 0.8 or α > 1.3 for ≥ 3 of 5 problem classes (convergence rate is not O(ε)).
  2. HARD FAILURE: C > 10·L for any problem class where L is finite and computable (bound is vacuously loose).
  3. HARD FAILURE: R² < 0.80 for ≥ 3 of 5 problem classes (no consistent power-law relationship).
  4. HARD FAILURE: Performative stable point iteration fails to converge for ε < ε_max in ≥ 2 problem classes (existence assumption violated).
  5. SOFT FAILURE: α ∈ [0.8, 0.9] for ≥ 3 classes (sub-linear but not catastrophic — hypothesis partially supported).
  6. SOFT FAILURE: L is not computable (infinite or undefined) for ≥ 2 problem classes, making the bound non-operational.
  7. SOFT FAILURE: Convergence rate α varies by > 0.3 across problem classes, suggesting result is problem-class-specific rather than general.

GPU_HOURS: 12

CPU_HOURS: 480

MEMORY_GB: 32

COST_USD_MIN: 150

COST_USD_FULL: 1200

100

GPU hours

30d

Time to result

$1,000

Min cost

$10,000

Full cost

ROI Projection

Implementation Sketch

# Performative Scenario Optimization Convergence Validator
# Architecture: 3-module pipeline

# MODULE 1: Problem Class Factory
class PerformativeProblem:
    def __init__(self, problem_type, dim, epsilon, L_true=None):
        self.type = problem_type  # 'LQ', 'portfolio', 'newsvendor', 'logistic', 'quadratic'
        self.dim = dim
        self.epsilon = epsilon
        self.L_true = L_true  # analytical Lipschitz modulus if known
    
    def distribution_map(self, x, epsilon):
        # Returns distribution D(x; epsilon) as (mean, cov) or scenario weights
        if self.type == 'LQ':
            # D(x; eps) = N(eps * x, I)
            mean = epsilon * x
            cov = np.eye(self.dim)
            return mean, cov
        elif self.type == 'portfolio':
            # D shifts return mean by epsilon * A @ x
            mean = self.mu_base + epsilon * self.A @ x
            return mean, self.Sigma
        # ... other types
    
    def objective(self, x, z):
        # f(x, z) — must be convex in x
        if self.type == 'LQ':
            return np.sum((x - z)**2)
        # ... other types

# MODULE 2: Stable Point Solver
def find_stable_point(problem, epsilon, n_scenarios=500, 
                       max_iter=10000, tol=1e-6, eta=0.01):
    """Repeated gradient descent to performative stability."""
    x = np.zeros(problem.dim)  # initialization
    for outer_iter in range(max_iter):
        x_prev = x.copy()
        # Sample scenarios from D(x_prev; epsilon)
        mean, cov = problem.distribution_map(x_prev, epsilon)
        scenarios = np.random.multivariate_normal(mean, cov, n_scenarios)
        
        # Inner loop: minimize E_{z~D(x_prev)}[f(x, z)]
        x = projected_gradient_descent(
            grad_fn=lambda x: np.mean([grad_f(x, z) for z in scenarios], axis=0),
            x_init=x_prev,
            projection=lambda x: project_to_feasible(x, problem),
            eta=eta, tol=tol/10
        )
        
        # Check stable point convergence
        if np.linalg.norm(x - x_prev) < tol:
            return x, outer_iter
    
    raise ConvergenceError(f"No stable point found after {max_iter} iterations")

def find_classical_sp_solution(problem, n_scenarios=500, tol=1e-6):
    """Classical SP: epsilon=0, fixed distribution."""
    return find_stable_point(problem, epsilon=0.0, 
                              n_scenarios=n_scenarios, tol=tol)

# MODULE 3: Convergence Rate Estimator
def estimate_convergence_rate(problem_class, dim, epsilon_grid, 
                               n_seeds=20, n_scenarios=500):
    """
    Returns: alpha (convergence rate), C (empirical constant), 
             L_empirical, R_squared
    """
    results = {eps: [] for eps in epsilon_grid}
    
    # Get classical SP solution (ground truth)
    x_star_0, _ = find_classical_sp_solution(
        PerformativeProblem(problem_class, dim, 0.0), n_scenarios
    )
    
    for eps in epsilon_grid:
        for seed in range(n_seeds):
            np.random.seed(seed)
            prob = PerformativeProblem(problem_class, dim, eps)
            x_star_eps, _ = find_stable_point(prob, eps, n_scenarios)
            dist = np.linalg.norm(x_star_eps - x_star_0)
            results[eps].append(dist)
    
    # Aggregate
    means = [np.mean(results[eps]) for eps in epsilon_grid]
    
    # Log-log regression (exclude eps=0)
    log_eps = np.log(epsilon_grid[1:])
    log_dist = np.log(means[1:])
    alpha, log_C = np.polyfit(log_eps, log_dist, 1)
    C = np.exp(log_C)
    
    # R-squared
    log_dist_pred = alpha * log_eps + log_C
    ss_res = np.sum((log_dist - log_dist_pred)**2)
    ss_tot = np.sum((log_dist - np.mean(log_dist))**2)
    R2 = 1 - ss_res / ss_tot
    
    # Empirical Lipschitz modulus
    L_empirical = estimate_lipschitz_modulus(
        PerformativeProblem(problem_class, dim, 1.0), n_pairs=1000
    )
    
    return {
        'alpha': alpha, 'C': C, 'L_empirical': L_empirical,
        'R2': R2, 'tightness_ratio': C / L_empirical,
        'means': means, 'epsilon_grid': epsilon_grid
    }

def estimate_lipschitz_modulus(problem, n_pairs=1000):
    """Estimate L = sup W1(D(x;1), D(x';1)) / ||x-x'||"""
    ratios = []
    for _ in range(n_pairs):
        x = np.random.randn(problem.dim)
        x_prime = np.random.randn(problem.dim)
        w1 = wasserstein_1_distance(
            problem.distribution_map(x, 1.0),
            problem.distribution_map(x_prime, 1.0)
        )
        ratios.append(w1 / (np.linalg.norm(x - x_prime) + 1e-10))
    return np.max(ratios)

# MAIN EXPERIMENT RUNNER
epsilon_grid = [0.0, 0.01, 0.05, 0.1, 0.2, 0.5]
problem_classes = ['LQ', 'portfolio', 'newsvendor', 'logistic', 'quadratic']
dimensions = [10, 100, 1000]

all_results = {}
for pc in problem_classes:
    for dim in dimensions:
        key = f"{pc}_d{dim}"
        all_results[key] = estimate_convergence_rate(
            pc, dim, epsilon_grid, n_seeds=20, n_scenarios=500
        )
        # Early abort check
        if all_results[key]['R2'] < 0.70:
            log_warning(f"Low R2={all_results[key]['R2']} for {key} — potential failure")
        if all_results[key]['alpha'] < 0.7 or all_results[key]['alpha'] > 1.5:
            log_critical(f"Alpha={all_results[key]['alpha']} out of range for {key}")

# Generate summary table and convergence plots
generate_report(all_results, output_dir='./results/')
Abort checkpoints:
  1. DAY 3 CHECKPOINT: If performative stable point iteration fails to converge for ε = 0.1 in the LQ problem class (simplest case), abort and debug solver before proceeding. Criterion: convergence tolerance not reached within 10,000 iterations for ≥ 50% of seeds.
  2. DAY 7 CHECKPOINT: If R² < 0.70 for log-log regression on LQ problem class (d=10), abort Phase 1 and investigate whether power-law relationship exists at all. This is the easiest case; failure here indicates fundamental issue.
  3. DAY 12 CHECKPOINT: If empirical Lipschitz modulus L cannot be estimated (variance > 100% of mean across 1000 random pairs) for ≥ 2 problem classes, abort bound-tightness analysis — the bound is not computable.
  4. DAY 18 CHECKPOINT: If α < 0.7 or α > 1.5 for ≥ 3 of 5 problem classes in Phase 1, abort Phase 2 and Phase 3 — core hypothesis is likely false; redirect to characterizing actual convergence rate.
  5. DAY 25 CHECKPOINT: If tightness ratio C/L > 100 for all problem classes, abort commercial value assessment — bound is theoretically correct but practically useless; redirect to tighter bound derivation.
  6. DAY 30 CHECKPOINT: If stress testing reveals that convergence fails (α < 0.5) for all non-convex f instances, scope the hypothesis to convex f only and update hypothesis restatement before finalizing report.

Source

AegisMind Research
Need AI to work rigorously on your problems? AegisMind uses the same multi-model engine for personal and professional use. Get started