solver.press

Performative scenario optimization solutions converge to classical stochastic optimization solutions in the limit where the decision-influence on the data-generating process vanishes, analogous to how entropic optimal transport converges to classical optimal transport as regularization approaches zero.

MathematicsApr 1, 2026Evaluation Score: 67%

Adversarial Debate Score

67% survival rate under critique

Model Critiques

anthropic: The hypothesis is mathematically plausible and falsifiable in principle, and the performative scenario optimization paper provides a relevant foundation, but the entropic optimal transport analogy is only loosely supported by the cited papers (the McKean-Pontryagin paper doesn't directly address ...
grok: Falsifiable via mathematical limits; plausible by construction as zero influence reduces performative to classical optimization, with strong entropic OT analogy supported by papers. Minor weakness: primary PSO paper excerpts do not explicitly prove convergence.
google: The hypothesis is highly falsifiable and logically sound, as the definitions

Supporting Research Papers

Formal Verification

Z3 logical consistency:⚠️ Unverified

Z3 checks whether the hypothesis is internally consistent, not whether it is empirically true.

Experimental Validation Package

This discovery has a Claude-generated validation package with a full experimental design.

Precise Hypothesis

For a performative scenario optimization problem parameterized by a decision variable θ ∈ Θ ⊆ ℝ^d, let D(θ) denote the decision-dependent data distribution and D₀ denote the baseline (decision-independent) distribution. Define the influence parameter ε ≥ 0 such that D_ε(θ) = (1-ε)·D(θ) + ε·D₀ interpolates between full performativity (ε=0) and no performativity (ε=1). The hypothesis states: as ε → 1 (equivalently, as the Wasserstein distance W₂(D_ε(θ), D₀) → 0 uniformly in θ), the performative stable solution θ_PS(ε) converges in norm to the classical stochastic optimization solution θSO = argmin{θ} E_{D₀}[ℓ(θ, Z)], with convergence rate ‖θ_PS(ε) - θ_SO‖ = O((1-ε)^α) for some α > 0. This is falsifiable: if the limit does not exist, if convergence is non-monotone, or if the rate is slower than any polynomial, the hypothesis is refuted.

Disproof criteria:
  1. PRIMARY DISPROOF: Numerical demonstration that ‖θ_PS(ε) - θ_SO‖ does not converge to 0 as ε → 1 in any of ≥3 distinct well-conditioned test problems satisfying boundary conditions, with tolerance < 10⁻³.
  2. RATE DISPROOF: If log‖θ_PS(ε) - θ_SO‖ vs. log(1-ε) shows no linear trend (R² < 0.8) across ε ∈ [0.9, 0.9999], convergence rate claim is refuted.
  3. COUNTEREXAMPLE: A single analytically tractable example satisfying all boundary conditions where the limit exists but equals θ ≠ θ*_SO.
  4. NON-MONOTONE CONVERGENCE: If ‖θ_PS(ε) - θ_SO‖ increases as ε increases from 0.95 to 0.9999 in >50% of test cases, the monotone convergence claim is refuted.
  5. INSTABILITY DISPROOF: If the performative stable solution θ*_PS(ε) is non-unique or discontinuous in ε for problems satisfying β < μ/L, the theoretical framework is undermined.
  6. ANALOGY BREAKDOWN: If the convergence structure is qualitatively different from entropic OT (e.g., requires different scaling, exhibits phase transitions), the analogical claim is weakened even if convergence holds.

Experimental Protocol

PHASE 1 — Analytical Verification (Days 1–14): Derive closed-form solutions for 3 tractable problem classes: (A) quadratic loss with Gaussian performative distribution, (B) linear regression with mean-shift performativity, (C) portfolio optimization with covariance-shift performativity. Verify convergence analytically and compute exact rates.

PHASE 2 — Numerical Convergence Study (Days 15–35): Implement gradient-based repeated risk minimization (RRM) and performative gradient descent across ε ∈ {0.0, 0.1, 0.2, ..., 0.9, 0.95, 0.99, 0.999, 0.9999}. Run 5 problem families × 10 random seeds × 20 ε values = 1000 experiments. Measure ‖θ_PS(ε) - θ_SO‖₂ at convergence.

PHASE 3 — Rate Estimation (Days 36–50): Fit power-law model ‖error‖ = C·(1-ε)^α via log-log regression. Bootstrap confidence intervals (n=1000 resamples). Compare α estimates across problem families.

PHASE 4 — Stress Testing (Days 51–65): Test near-boundary conditions: β approaching μ/L, non-smooth losses (Huber), heavy-tailed distributions (Student-t with ν=3), and high-dimensional θ (d ∈ {10, 100, 1000}).

PHASE 5 — Analogy Formalization (Days 66–80): Construct explicit mathematical mapping between performativity parameter ε and entropic OT regularization λ. Verify whether convergence rates match under this mapping.

Required datasets:
  1. SYNTHETIC — Quadratic performative problems: Generated programmatically. Parameters: d ∈ {2, 10, 50, 100}, β ∈ {0.01, 0.1, 0.5·μ/L}, μ ∈ {0.1, 1.0, 10.0}. Size: ~10,000 problem instances. Generation cost: negligible (CPU only).
  2. SYNTHETIC — Linear regression with distribution shift: n_samples ∈ {100, 1000, 10000} per distribution, feature dimension d ∈ {5, 20, 100}. Performativity: mean shift proportional to θ with coefficient β.
  3. SYNTHETIC — Portfolio optimization: Assets k ∈ {5, 20, 50}, covariance matrix shift model Σ(θ) = Σ₀ + β·θθᵀ/‖θ‖². Historical return data from Yahoo Finance (2010–2023) as D₀ baseline.
  4. REAL-WORLD PROXY — Strategic classification dataset: UCI Adult Income dataset (48,842 samples, 14 features) with simulated strategic response model as performativity proxy. Available at: https://archive.ics.uci.edu/ml/datasets/adult.
  5. REAL-WORLD PROXY — Credit scoring with feedback: Kaggle Give Me Some Credit dataset (150,000 samples) with loan approval → repayment behavior feedback loop model.
  6. REFERENCE IMPLEMENTATION: Perdomo et al. (2020) performative prediction codebase (GitHub: gradientinstitute/performative-prediction) as baseline comparison.
Success:
  1. CONVERGENCE: ‖θ_PS(ε) - θ_SO‖₂ < 10⁻³ for ε ≥ 0.999 in ≥4/5 problem families (80% success rate).
  2. MONOTONICITY: ‖θ_PS(ε) - θ_SO‖ is monotonically decreasing in ε for ε ∈ [0.5, 0.9999] in ≥90% of (problem, seed) pairs.
  3. POWER-LAW RATE: Log-log regression R² ≥ 0.95 for ε ∈ [0.9, 0.9999] in ≥3/5 problem families.
  4. POSITIVE RATE: Estimated α > 0 with 95% CI lower bound > 0 in all 5 problem families.
  5. ANALYTICAL CONSISTENCY: Numerical α estimates within ±0.1 of analytically derived rates for quadratic case.
  6. ANALOGY SUPPORT: Ratio ‖performative error‖/‖OT error‖ converges to constant C ∈ (0, ∞) with CV < 0.1 across matched parameter pairs.
  7. REPRODUCIBILITY: All results reproducible within ±5% across 3 independent runs with different random seeds.
Failure:
  1. HARD FAILURE: ‖θ_PS(ε) - θ_SO‖ > 0.1 for ε = 0.9999 in ≥3/5 problem families — convergence claim rejected.
  2. HARD FAILURE: Analytical derivation yields θ_PS(ε→1) ≠ θ_SO for any problem satisfying boundary conditions — hypothesis refuted analytically.
  3. SOFT FAILURE: Log-log R² < 0.8 in ≥3/5 families — power-law rate claim weakened, requires alternative rate model.
  4. SOFT FAILURE: α estimates vary by >2× across problem families — rate is problem-dependent, weakening universality claim.
  5. SOFT FAILURE: Convergence requires ε > 0.9999 to achieve 10⁻³ tolerance — practical convergence is too slow to be useful.
  6. ANALOGY FAILURE: Ratio ‖performative error‖/‖OT error‖ diverges or oscillates — entropic OT analogy is superficial.
  7. NUMERICAL FAILURE: RRM fails to converge (>10,000 iterations) for >20% of problem instances — solver reliability issue, not hypothesis issue.

12

GPU hours

80d

Time to result

$180

Min cost

$1,450

Full cost

ROI Projection

Implementation Sketch

# Core Implementation Architecture

import jax
import jax.numpy as jnp
from jax import grad, jit, vmap
import optax
from dataclasses import dataclass
from typing import Callable, Tuple
import numpy as np

@dataclass
class PerformativeProblem:
    """Parameterized performative optimization problem."""
    loss_fn: Callable          # ℓ(θ, z) -> scalar
    dist_map: Callable         # θ -> distribution parameters
    base_dist_params: dict     # D₀ parameters
    theta_dim: int
    epsilon: float = 0.0       # performativity parameter (0=full, 1=classical)
    
    def interpolated_dist_params(self, theta):
        """D_ε(θ) = (1-ε)·D(θ) + ε·D₀"""
        perf_params = self.dist_map(theta)
        # Interpolate distribution parameters
        return jax.tree_util.tree_map(
            lambda p, b: (1 - self.epsilon) * p + self.epsilon * b,
            perf_params, self.base_dist_params
        )
    
    def expected_loss(self, theta, n_samples=1000, key=jax.random.PRNGKey(0)):
        """Monte Carlo estimate of E_{D_ε(θ)}[ℓ(θ, Z)]"""
        dist_params = self.interpolated_dist_params(theta)
        samples = sample_distribution(dist_params, n_samples, key)
        return jnp.mean(vmap(lambda z: self.loss_fn(theta, z))(samples))

def repeated_risk_minimization(problem: PerformativeProblem, 
                                theta_init: jnp.ndarray,
                                max_iter: int = 10000,
                                tol: float = 1e-8,
                                lr: float = 0.01) -> Tuple[jnp.ndarray, int]:
    """
    RRM: θ_{t+1} = argmin_θ E_{D(θ_t)}[ℓ(θ, Z)]
    Returns: (theta_star, n_iterations)
    """
    optimizer = optax.adam(lr)
    theta = theta_init.copy()
    opt_state = optimizer.init(theta)
    
    for t in range(max_iter):
        theta_prev = theta.copy()
        # Fix distribution at current theta, minimize over theta
        fixed_dist_params = problem.interpolated_dist_params(theta)
        
        def fixed_loss(th):
            samples = sample_distribution(fixed_dist_params, n_samples=500)
            return jnp.mean(vmap(lambda z: problem.loss_fn(th, z))(samples))
        
        # Inner optimization loop
        theta = inner_minimize(fixed_loss, theta, n_steps=100, lr=lr)
        
        if jnp.norm(theta - theta_prev) < tol:
            return theta, t
    
    return theta, max_iter  # Did not converge

def convergence_experiment(problem_family: str,
                           epsilon_values: list,
                           n_seeds: int = 10) -> dict:
    """
    Main experiment: measure ||θ*_PS(ε) - θ*_SO||₂ across ε values.
    """
    results = {'epsilon': [], 'error': [], 'seed': [], 'family': []}
    
    # Solve classical problem once
    classical_problem = create_problem(problem_family, epsilon=1.0)
    theta_SO = solve_classical(classical_problem)
    
    for eps in epsilon_values:
        for seed in range(n_seeds):
            key = jax.random.PRNGKey(seed)
            theta_init = jax.random.normal(key, (classical_problem.theta_dim,))
            
            perf_problem = create_problem(problem_family, epsilon=eps)
            theta_PS, n_iter = repeated_risk_minimization(
                perf_problem, theta_init
            )
            
            error = float(jnp.norm(theta_PS - theta_SO))
            results['epsilon'].append(eps)
            results['error'].append(error)
            results['seed'].append(seed)
            results['family'].append(problem_family)
    
    return results

def estimate_convergence_rate(results: dict) -> dict:
    """
    Fit power law: log(error) = log(C) + α·log(1-ε)
    Returns: {'alpha': float, 'alpha_ci': (float, float), 'r_squared': float}
    """
    eps = np.array(results['epsilon'])
    errors = np.array(results['error'])
    
    # Filter to high-ε regime
    mask = eps >= 0.9
    log_x = np.log(1 - eps[mask])
    log_y = np.log(errors[mask] + 1e-12)
    
    # OLS fit
    A = np.column_stack([np.ones_like(log_x), log_x])
    coeffs, residuals, _, _ = np.linalg.lstsq(A, log_y, rcond=None)
    log_C, alpha = coeffs
    
    # Bootstrap CI
    alpha_bootstrap = []
    for _ in range(1000):
        idx = np.random.choice(len(log_x), len(log_x), replace=True)
        c, _, _, _ = np.linalg.lstsq(A[idx], log_y[idx], rcond=None)
        alpha_bootstrap.append(c[1])
    
    r_squared = 1 - np.var(log_y - A @ coeffs) / np.var(log_y)
    
    return {
        'alpha': alpha,
        'alpha_ci': (np.percentile(alpha_bootstrap, 2.5),
                     np.percentile(alpha_bootstrap, 97.5)),
        'r_squared': r_squared,
        'C': np.exp(log_C)
    }

# ANALYTICAL VERIFICATION: Quadratic case
# ℓ(θ,z) = ½||θ-z||², D(θ) = N(Aθ+b, Σ)
# θ*_PS = (I-A)⁻¹b  [requires ||A|| < 1]
# D_ε(θ) = N(ε·(Aθ+b) + (1-ε)·b, Σ) [simplified interpolation]
# Wait: D_ε(θ) = N((1-ε)·(Aθ+b) + ε·b, Σ) = N((1-ε)Aθ+b, Σ)
# θ*_PS(ε) = (I-(1-ε)A)⁻¹b
# As ε→1: θ*_PS(ε) → (I-0)⁻¹b = b = θ*_SO ✓
# Rate: ||θ*_PS(ε) - θ*_SO|| = ||(I-(1-ε)A)⁻¹b - b||
#      ≈ ||(1-ε)A·b|| = O(1-ε) for small (1-ε)

def analytical_quadratic_solution(A, b, epsilon):
    """Closed-form θ*_PS(ε) for quadratic performative problem."""
    I = jnp.eye(A.shape[0])
    return jnp.linalg.solve(I - (1 - epsilon) * A, b)

# MAIN EXECUTION
if __name__ == "__main__":
    epsilon_grid = [0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 
                    0.95, 0.99, 0.999, 0.9999]
    problem_families = ['quadratic', 'linear_regression', 
                        'portfolio', 'strategic_classification',
                        'credit_scoring']
    
    all_results = {}
    for family in problem_families:
        print(f"Running experiments for {family}...")
        results = convergence_experiment(family, epsilon_grid, n_seeds=10)
        rate_info = estimate_convergence_rate(results)
        all_results[family] = {'data': results, 'rate': rate_info}
        print(f"  α = {rate_info['alpha']:.3f} "
              f"[{rate_info['alpha_ci'][0]:.3f}, {rate_info['alpha_ci'][1]:.3f}], "
              f"R² = {rate_info['r_squared']:.4f}")
    
    # Check success criteria
    convergence_check = all(
        min(r['data']['error'][-10:]) < 1e-3  # last 10 = highest ε
        for r in all_results.values()
    )
    print(f"\nConvergence criterion met: {convergence_check}")
Abort checkpoints:
  1. DAY 10 — ANALYTICAL CHECKPOINT: If closed-form derivation for quadratic case yields θ_PS(ε→1) ≠ θ_SO, abort and report analytical counterexample. Estimated probability of abort: 5%.
  2. DAY 20 — SOLVER VALIDATION CHECKPOINT: If RRM solver fails to achieve <10⁻⁶ optimality gap on analytical test cases in >30% of instances, abort numerical phase and fix solver before proceeding. Do not proceed with unreliable solver.
  3. DAY 32 — EARLY CONVERGENCE SIGNAL: Compute ‖θ_PS(0.99) - θ_SO‖ for all 5 problem families. If >3/5 show error > 0.1, abort full sweep and investigate root cause (theory error vs. implementation error).
  4. DAY 45 — RATE ESTIMATION CHECKPOINT: If R² < 0.5 for log-log fit in ≥4/5 families, the power-law rate model is wrong. Abort rate estimation phase; pivot to characterizing actual convergence structure.
  5. DAY 55 — STRESS TEST CHECKPOINT: If near-boundary conditions (β = 0.95·μ/L) cause >50% of experiments to fail to converge, document instability regime and restrict claims to β < 0.5·μ/L.
  6. DAY 65 — ANALOGY CHECKPOINT: If the ε ↔ λ mapping cannot be constructed (ratio diverges), abort analogy formalization and reframe as independent convergence result without OT analogy.
  7. CONTINUOUS — COST CHECKPOINT: If AWS/GCP costs exceed $800 (55% of full budget) before Day 50, switch to CPU-only computation for remaining experiments and reduce n_seeds from 10 to 5.

Source

AegisMind Research
Need AI to work rigorously on your problems? AegisMind uses the same multi-model engine for personal and professional use. Get started