Performative scenario optimization solutions converge to classical stochastic optimization solutions in the limit where the decision-influence on the data-generating process vanishes, analogous to how entropic optimal transport converges to classical optimal transport as regularization approaches zero.
Adversarial Debate Score
67% survival rate under critique
Model Critiques
Supporting Research Papers
- Performative Scenario Optimization
This paper introduces a performative scenario optimization framework for decision-dependent chance-constrained problems. Unlike classical stochastic optimization, we account for the feedback loop wher...
- ParetoEnsembles.jl: A Julia Package for Multiobjective Parameter Estimation Using Pareto Optimal Ensemble Techniques
Mathematical models of natural and man-made systems often have many adjustable parameters that must be estimated from multiple, potentially conflicting datasets. Rather than reporting a single best-fi...
- On Lipschitzian properties of multifunctions defined implicitly by"split"feasibility problems
In the present paper, a systematic study is made of quantitative semicontinuity (a.k.a. Lipschitzian) properties of certain multifunctions, which are defined as a solution map associated to a family o...
- Sampling at intermediate temperatures is optimal for training large language models in protein structure prediction
We investigate the parameter space of transformer models trained on protein sequence data using a statistical mechanics framework, sampling the loss landscape at varying temperatures by Langevin dynam...
Formal Verification
Z3 checks whether the hypothesis is internally consistent, not whether it is empirically true.
This discovery has a Claude-generated validation package with a full experimental design.
Precise Hypothesis
For a performative scenario optimization problem parameterized by a decision variable θ ∈ Θ ⊆ ℝ^d, let D(θ) denote the decision-dependent data distribution and D₀ denote the baseline (decision-independent) distribution. Define the influence parameter ε ≥ 0 such that D_ε(θ) = (1-ε)·D(θ) + ε·D₀ interpolates between full performativity (ε=0) and no performativity (ε=1). The hypothesis states: as ε → 1 (equivalently, as the Wasserstein distance W₂(D_ε(θ), D₀) → 0 uniformly in θ), the performative stable solution θ_PS(ε) converges in norm to the classical stochastic optimization solution θSO = argmin{θ} E_{D₀}[ℓ(θ, Z)], with convergence rate ‖θ_PS(ε) - θ_SO‖ = O((1-ε)^α) for some α > 0. This is falsifiable: if the limit does not exist, if convergence is non-monotone, or if the rate is slower than any polynomial, the hypothesis is refuted.
- PRIMARY DISPROOF: Numerical demonstration that ‖θ_PS(ε) - θ_SO‖ does not converge to 0 as ε → 1 in any of ≥3 distinct well-conditioned test problems satisfying boundary conditions, with tolerance < 10⁻³.
- RATE DISPROOF: If log‖θ_PS(ε) - θ_SO‖ vs. log(1-ε) shows no linear trend (R² < 0.8) across ε ∈ [0.9, 0.9999], convergence rate claim is refuted.
- COUNTEREXAMPLE: A single analytically tractable example satisfying all boundary conditions where the limit exists but equals θ ≠ θ*_SO.
- NON-MONOTONE CONVERGENCE: If ‖θ_PS(ε) - θ_SO‖ increases as ε increases from 0.95 to 0.9999 in >50% of test cases, the monotone convergence claim is refuted.
- INSTABILITY DISPROOF: If the performative stable solution θ*_PS(ε) is non-unique or discontinuous in ε for problems satisfying β < μ/L, the theoretical framework is undermined.
- ANALOGY BREAKDOWN: If the convergence structure is qualitatively different from entropic OT (e.g., requires different scaling, exhibits phase transitions), the analogical claim is weakened even if convergence holds.
Experimental Protocol
PHASE 1 — Analytical Verification (Days 1–14): Derive closed-form solutions for 3 tractable problem classes: (A) quadratic loss with Gaussian performative distribution, (B) linear regression with mean-shift performativity, (C) portfolio optimization with covariance-shift performativity. Verify convergence analytically and compute exact rates.
PHASE 2 — Numerical Convergence Study (Days 15–35): Implement gradient-based repeated risk minimization (RRM) and performative gradient descent across ε ∈ {0.0, 0.1, 0.2, ..., 0.9, 0.95, 0.99, 0.999, 0.9999}. Run 5 problem families × 10 random seeds × 20 ε values = 1000 experiments. Measure ‖θ_PS(ε) - θ_SO‖₂ at convergence.
PHASE 3 — Rate Estimation (Days 36–50): Fit power-law model ‖error‖ = C·(1-ε)^α via log-log regression. Bootstrap confidence intervals (n=1000 resamples). Compare α estimates across problem families.
PHASE 4 — Stress Testing (Days 51–65): Test near-boundary conditions: β approaching μ/L, non-smooth losses (Huber), heavy-tailed distributions (Student-t with ν=3), and high-dimensional θ (d ∈ {10, 100, 1000}).
PHASE 5 — Analogy Formalization (Days 66–80): Construct explicit mathematical mapping between performativity parameter ε and entropic OT regularization λ. Verify whether convergence rates match under this mapping.
- SYNTHETIC — Quadratic performative problems: Generated programmatically. Parameters: d ∈ {2, 10, 50, 100}, β ∈ {0.01, 0.1, 0.5·μ/L}, μ ∈ {0.1, 1.0, 10.0}. Size: ~10,000 problem instances. Generation cost: negligible (CPU only).
- SYNTHETIC — Linear regression with distribution shift: n_samples ∈ {100, 1000, 10000} per distribution, feature dimension d ∈ {5, 20, 100}. Performativity: mean shift proportional to θ with coefficient β.
- SYNTHETIC — Portfolio optimization: Assets k ∈ {5, 20, 50}, covariance matrix shift model Σ(θ) = Σ₀ + β·θθᵀ/‖θ‖². Historical return data from Yahoo Finance (2010–2023) as D₀ baseline.
- REAL-WORLD PROXY — Strategic classification dataset: UCI Adult Income dataset (48,842 samples, 14 features) with simulated strategic response model as performativity proxy. Available at: https://archive.ics.uci.edu/ml/datasets/adult.
- REAL-WORLD PROXY — Credit scoring with feedback: Kaggle Give Me Some Credit dataset (150,000 samples) with loan approval → repayment behavior feedback loop model.
- REFERENCE IMPLEMENTATION: Perdomo et al. (2020) performative prediction codebase (GitHub: gradientinstitute/performative-prediction) as baseline comparison.
- CONVERGENCE: ‖θ_PS(ε) - θ_SO‖₂ < 10⁻³ for ε ≥ 0.999 in ≥4/5 problem families (80% success rate).
- MONOTONICITY: ‖θ_PS(ε) - θ_SO‖ is monotonically decreasing in ε for ε ∈ [0.5, 0.9999] in ≥90% of (problem, seed) pairs.
- POWER-LAW RATE: Log-log regression R² ≥ 0.95 for ε ∈ [0.9, 0.9999] in ≥3/5 problem families.
- POSITIVE RATE: Estimated α > 0 with 95% CI lower bound > 0 in all 5 problem families.
- ANALYTICAL CONSISTENCY: Numerical α estimates within ±0.1 of analytically derived rates for quadratic case.
- ANALOGY SUPPORT: Ratio ‖performative error‖/‖OT error‖ converges to constant C ∈ (0, ∞) with CV < 0.1 across matched parameter pairs.
- REPRODUCIBILITY: All results reproducible within ±5% across 3 independent runs with different random seeds.
- HARD FAILURE: ‖θ_PS(ε) - θ_SO‖ > 0.1 for ε = 0.9999 in ≥3/5 problem families — convergence claim rejected.
- HARD FAILURE: Analytical derivation yields θ_PS(ε→1) ≠ θ_SO for any problem satisfying boundary conditions — hypothesis refuted analytically.
- SOFT FAILURE: Log-log R² < 0.8 in ≥3/5 families — power-law rate claim weakened, requires alternative rate model.
- SOFT FAILURE: α estimates vary by >2× across problem families — rate is problem-dependent, weakening universality claim.
- SOFT FAILURE: Convergence requires ε > 0.9999 to achieve 10⁻³ tolerance — practical convergence is too slow to be useful.
- ANALOGY FAILURE: Ratio ‖performative error‖/‖OT error‖ diverges or oscillates — entropic OT analogy is superficial.
- NUMERICAL FAILURE: RRM fails to converge (>10,000 iterations) for >20% of problem instances — solver reliability issue, not hypothesis issue.
12
GPU hours
80d
Time to result
$180
Min cost
$1,450
Full cost
ROI Projection
Implementation Sketch
# Core Implementation Architecture import jax import jax.numpy as jnp from jax import grad, jit, vmap import optax from dataclasses import dataclass from typing import Callable, Tuple import numpy as np @dataclass class PerformativeProblem: """Parameterized performative optimization problem.""" loss_fn: Callable # ℓ(θ, z) -> scalar dist_map: Callable # θ -> distribution parameters base_dist_params: dict # D₀ parameters theta_dim: int epsilon: float = 0.0 # performativity parameter (0=full, 1=classical) def interpolated_dist_params(self, theta): """D_ε(θ) = (1-ε)·D(θ) + ε·D₀""" perf_params = self.dist_map(theta) # Interpolate distribution parameters return jax.tree_util.tree_map( lambda p, b: (1 - self.epsilon) * p + self.epsilon * b, perf_params, self.base_dist_params ) def expected_loss(self, theta, n_samples=1000, key=jax.random.PRNGKey(0)): """Monte Carlo estimate of E_{D_ε(θ)}[ℓ(θ, Z)]""" dist_params = self.interpolated_dist_params(theta) samples = sample_distribution(dist_params, n_samples, key) return jnp.mean(vmap(lambda z: self.loss_fn(theta, z))(samples)) def repeated_risk_minimization(problem: PerformativeProblem, theta_init: jnp.ndarray, max_iter: int = 10000, tol: float = 1e-8, lr: float = 0.01) -> Tuple[jnp.ndarray, int]: """ RRM: θ_{t+1} = argmin_θ E_{D(θ_t)}[ℓ(θ, Z)] Returns: (theta_star, n_iterations) """ optimizer = optax.adam(lr) theta = theta_init.copy() opt_state = optimizer.init(theta) for t in range(max_iter): theta_prev = theta.copy() # Fix distribution at current theta, minimize over theta fixed_dist_params = problem.interpolated_dist_params(theta) def fixed_loss(th): samples = sample_distribution(fixed_dist_params, n_samples=500) return jnp.mean(vmap(lambda z: problem.loss_fn(th, z))(samples)) # Inner optimization loop theta = inner_minimize(fixed_loss, theta, n_steps=100, lr=lr) if jnp.norm(theta - theta_prev) < tol: return theta, t return theta, max_iter # Did not converge def convergence_experiment(problem_family: str, epsilon_values: list, n_seeds: int = 10) -> dict: """ Main experiment: measure ||θ*_PS(ε) - θ*_SO||₂ across ε values. """ results = {'epsilon': [], 'error': [], 'seed': [], 'family': []} # Solve classical problem once classical_problem = create_problem(problem_family, epsilon=1.0) theta_SO = solve_classical(classical_problem) for eps in epsilon_values: for seed in range(n_seeds): key = jax.random.PRNGKey(seed) theta_init = jax.random.normal(key, (classical_problem.theta_dim,)) perf_problem = create_problem(problem_family, epsilon=eps) theta_PS, n_iter = repeated_risk_minimization( perf_problem, theta_init ) error = float(jnp.norm(theta_PS - theta_SO)) results['epsilon'].append(eps) results['error'].append(error) results['seed'].append(seed) results['family'].append(problem_family) return results def estimate_convergence_rate(results: dict) -> dict: """ Fit power law: log(error) = log(C) + α·log(1-ε) Returns: {'alpha': float, 'alpha_ci': (float, float), 'r_squared': float} """ eps = np.array(results['epsilon']) errors = np.array(results['error']) # Filter to high-ε regime mask = eps >= 0.9 log_x = np.log(1 - eps[mask]) log_y = np.log(errors[mask] + 1e-12) # OLS fit A = np.column_stack([np.ones_like(log_x), log_x]) coeffs, residuals, _, _ = np.linalg.lstsq(A, log_y, rcond=None) log_C, alpha = coeffs # Bootstrap CI alpha_bootstrap = [] for _ in range(1000): idx = np.random.choice(len(log_x), len(log_x), replace=True) c, _, _, _ = np.linalg.lstsq(A[idx], log_y[idx], rcond=None) alpha_bootstrap.append(c[1]) r_squared = 1 - np.var(log_y - A @ coeffs) / np.var(log_y) return { 'alpha': alpha, 'alpha_ci': (np.percentile(alpha_bootstrap, 2.5), np.percentile(alpha_bootstrap, 97.5)), 'r_squared': r_squared, 'C': np.exp(log_C) } # ANALYTICAL VERIFICATION: Quadratic case # ℓ(θ,z) = ½||θ-z||², D(θ) = N(Aθ+b, Σ) # θ*_PS = (I-A)⁻¹b [requires ||A|| < 1] # D_ε(θ) = N(ε·(Aθ+b) + (1-ε)·b, Σ) [simplified interpolation] # Wait: D_ε(θ) = N((1-ε)·(Aθ+b) + ε·b, Σ) = N((1-ε)Aθ+b, Σ) # θ*_PS(ε) = (I-(1-ε)A)⁻¹b # As ε→1: θ*_PS(ε) → (I-0)⁻¹b = b = θ*_SO ✓ # Rate: ||θ*_PS(ε) - θ*_SO|| = ||(I-(1-ε)A)⁻¹b - b|| # ≈ ||(1-ε)A·b|| = O(1-ε) for small (1-ε) def analytical_quadratic_solution(A, b, epsilon): """Closed-form θ*_PS(ε) for quadratic performative problem.""" I = jnp.eye(A.shape[0]) return jnp.linalg.solve(I - (1 - epsilon) * A, b) # MAIN EXECUTION if __name__ == "__main__": epsilon_grid = [0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 0.95, 0.99, 0.999, 0.9999] problem_families = ['quadratic', 'linear_regression', 'portfolio', 'strategic_classification', 'credit_scoring'] all_results = {} for family in problem_families: print(f"Running experiments for {family}...") results = convergence_experiment(family, epsilon_grid, n_seeds=10) rate_info = estimate_convergence_rate(results) all_results[family] = {'data': results, 'rate': rate_info} print(f" α = {rate_info['alpha']:.3f} " f"[{rate_info['alpha_ci'][0]:.3f}, {rate_info['alpha_ci'][1]:.3f}], " f"R² = {rate_info['r_squared']:.4f}") # Check success criteria convergence_check = all( min(r['data']['error'][-10:]) < 1e-3 # last 10 = highest ε for r in all_results.values() ) print(f"\nConvergence criterion met: {convergence_check}")
- DAY 10 — ANALYTICAL CHECKPOINT: If closed-form derivation for quadratic case yields θ_PS(ε→1) ≠ θ_SO, abort and report analytical counterexample. Estimated probability of abort: 5%.
- DAY 20 — SOLVER VALIDATION CHECKPOINT: If RRM solver fails to achieve <10⁻⁶ optimality gap on analytical test cases in >30% of instances, abort numerical phase and fix solver before proceeding. Do not proceed with unreliable solver.
- DAY 32 — EARLY CONVERGENCE SIGNAL: Compute ‖θ_PS(0.99) - θ_SO‖ for all 5 problem families. If >3/5 show error > 0.1, abort full sweep and investigate root cause (theory error vs. implementation error).
- DAY 45 — RATE ESTIMATION CHECKPOINT: If R² < 0.5 for log-log fit in ≥4/5 families, the power-law rate model is wrong. Abort rate estimation phase; pivot to characterizing actual convergence structure.
- DAY 55 — STRESS TEST CHECKPOINT: If near-boundary conditions (β = 0.95·μ/L) cause >50% of experiments to fail to converge, document instability regime and restrict claims to β < 0.5·μ/L.
- DAY 65 — ANALOGY CHECKPOINT: If the ε ↔ λ mapping cannot be constructed (ratio diverges), abort analogy formalization and reframe as independent convergence result without OT analogy.
- CONTINUOUS — COST CHECKPOINT: If AWS/GCP costs exceed $800 (55% of full budget) before Day 50, switch to CPU-only computation for remaining experiments and reduce n_seeds from 10 to 5.