Adaptive sampling in parametrized dynamical systems can improve the efficiency of surrogate-based amortized optimization when problem parameters vary continuously.
Adversarial Debate Score
63% survival rate under critique
Model Critiques
Supporting Research Papers
- Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels
To scale the solution of optimization and simulation problems, prior work has explored machine-learning surrogates that inexpensively map problem parameters to corresponding solutions. Commonly used a...
- FlashOptim: Optimizers for Memory Efficient Training
Standard mixed-precision training of neural networks requires many bytes of accelerator memory for each model parameter. These bytes reflect not just the parameter itself, but also its gradient and on...
- Universal Persistent Brownian Motions in Confluent Tissues
Biological tissues are active materials whose non-equilibrium dynamics emerge from distinct cellular force-generating mechanisms. Using a two-dimensional active foam model, we compare the effects of t...
- Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks
The advancement of large language models (LLMs) has accelerated the development of autonomous financial trading systems. While mainstream approaches deploy multi-agent systems mimicking analyst and ma...
Formal Verification
Z3 checks whether the hypothesis is internally consistent, not whether it is empirically true.
This discovery has a Claude-generated validation package with a full experimental design.
Precise Hypothesis
Adaptive sampling strategies applied to the parameter space of parametrized dynamical systems will reduce the total number of high-fidelity simulator evaluations required to achieve a fixed surrogate model accuracy (measured by normalized RMSE ≤ 0.05) compared to uniform/random sampling baselines, when used within a surrogate-based amortized optimization pipeline and when problem parameters vary continuously over a compact domain. Specifically, adaptive sampling will achieve equivalent optimization quality (within 2% of optimal objective value) using ≤ 50% of the simulator calls required by uniform sampling.
- Adaptive sampling requires ≥ 90% of the simulator calls of uniform sampling to achieve equivalent surrogate RMSE (≤ 0.05) across 3+ benchmark systems — no meaningful efficiency gain.
- Optimization quality under adaptive-surrogate pipeline is statistically worse (p < 0.05, paired t-test) than uniform-surrogate pipeline at matched simulator budgets.
- Adaptive sampling overhead (query selection time) exceeds 20% of total wall-clock time, negating computational savings.
- Surrogate trained on adaptive samples exhibits higher variance in optimization outcomes (std > 2× that of uniform baseline) across 10 independent runs, indicating instability.
- On ≥ 2 of 3 benchmark dynamical systems, adaptive sampling provides < 10% reduction in simulator calls at any fixed accuracy threshold.
- A simple heuristic (e.g., Latin Hypercube Sampling) matches or outperforms adaptive sampling on all benchmarks.
Experimental Protocol
Minimum Viable Test (MVT): Compare adaptive sampling (uncertainty-guided, e.g., active learning with GP or ensemble) vs. uniform random sampling vs. Latin Hypercube Sampling (LHS) as surrogate training data acquisition strategies on 3 parametrized dynamical systems of increasing complexity. Evaluate surrogate accuracy and downstream optimization quality as a function of simulator call budget. Run 10 independent trials per condition. Primary metric: simulator calls to reach RMSE ≤ 0.05 and optimization gap ≤ 2%.
Full Validation: Extend to 6 benchmark systems, 5 adaptive strategies, ablation over parameter space dimensionality (d = 2, 5, 10, 20), and real-world physics simulators (e.g., fluid dynamics, epidemiological models). Include wall-clock time profiling and scalability analysis.
- Lorenz-63 system (parametrized by σ, ρ, β ∈ continuous ranges) — synthetic, self-generated via scipy/Julia DifferentialEquations.jl.
- Van der Pol oscillator (parametrized by damping coefficient μ ∈ [0.1, 5.0]) — synthetic.
- Parametrized 2D Navier-Stokes (viscosity ν, forcing amplitude f) — available via FEniCS or neuraloperator benchmark suite (https://github.com/neuraloperator/neuraloperator).
- SIR epidemiological model (β_infection, γ_recovery ∈ continuous ranges) — synthetic.
- Duffing oscillator (nonlinearity parameter, damping) — synthetic.
- (Full validation) OpenFOAM or DOLFIN-based PDE benchmark with ≥ 5 continuous parameters.
- Pre-trained neural operator checkpoints (FNO, DeepONet) from public repositories for transfer learning baselines.
- GPyTorch / BoTorch for GP-based adaptive sampling implementation.
- PyTorch Ensemble models for deep ensemble UQ baseline.
- Adaptive sampling achieves RMSE ≤ 0.05 using ≤ 50% of simulator calls vs. uniform sampling on ≥ 2 of 3 benchmark systems (primary criterion).
- Optimization gap ≤ 2% achieved by adaptive strategy at ≤ 60% of the simulator budget required by uniform sampling on ≥ 2 of 3 systems.
- Statistical significance: p < 0.05 (Wilcoxon signed-rank) for simulator call savings on ≥ 2 systems.
- Adaptive query selection overhead < 5% of total wall-clock time for d ≤ 10.
- Results replicate across ≥ 8 of 10 independent trials (80% replication rate).
- Efficiency gain is monotonically increasing with simulator cost-to-surrogate-inference ratio (correlation r > 0.7).
- Adaptive sampling requires > 80% of uniform sampling budget to reach RMSE ≤ 0.05 on all 3 benchmark systems.
- Optimization gap under adaptive pipeline exceeds 5% at any matched budget where uniform achieves ≤ 2%.
- Query selection overhead exceeds 20% of total wall-clock time for d ≤ 10.
- Results are not reproducible: > 3 of 10 trials show qualitatively different outcomes (e.g., adaptive worse than uniform).
- LHS alone matches adaptive sampling performance within 5% on all metrics — adaptive adds no value over smart initialization.
- Surrogate RMSE fails to decrease monotonically with budget for adaptive strategy (indicating instability) in ≥ 2 systems.
480
GPU hours
45d
Time to result
$1,200
Min cost
$8,500
Full cost
ROI Projection
- Engineering simulation software (ANSYS, Siemens, Dassault): adaptive surrogate training modules could be integrated as premium features, estimated market value $10M–$50M in licensing.
- Pharmaceutical/biotech: parametrized ODE models for drug PK/PD optimization — 50% reduction in simulation costs for clinical trial design optimization.
- Climate/weather modeling: adaptive sampling for parametrized climate models reduces ensemble simulation costs for uncertainty quantification.
- Autonomous systems: real-time adaptive surrogate updating for robotics and control systems operating in varying environments.
- Financial modeling: parametrized stochastic differential equations for option pricing and risk optimization — faster calibration pipelines.
- Defense/aerospace: aerodynamic shape optimization with CFD surrogates — direct cost reduction in design cycles.
- Estimated total addressable market for adaptive surrogate optimization tools: $200M–$2B over 10 years across engineering simulation verticals.
- Direct computational savings: 50% reduction in simulator calls translates to 50% reduction in HPC costs for surrogate training pipelines. For a typical engineering optimization workflow consuming $100K/year in simulation costs, this yields $50K/year savings per project.
- Accelerated design cycles: reducing surrogate training time from weeks to days enables 3–5× more design iterations per project timeline.
- Democratization: smaller labs with limited HPC budgets can access surrogate-based optimization previously requiring large compute clusters.
- Estimated aggregate research impact: if adopted across 1,000 active computational physics/engineering groups, potential savings of $50M–$500M/year in simulation costs globally.
- Publication impact: expected citations in top venues (NeurIPS, ICML, ICLR, Journal of Computational Physics) — estimated 200–500 citations within 5 years if validated.
🔓 If proven, this unlocks
Proving this hypothesis is a prerequisite for the following downstream discoveries and applications:
- 1adaptive-sampling-high-dimensional-parameter-spaces
- 2online-adaptive-surrogate-updating-real-time-control
- 3multi-fidelity-adaptive-sampling-dynamical-systems
- 4amortized-optimization-distribution-shift-robustness
- 5physics-informed-active-learning-pde-constrained-optimization
Prerequisites
These must be validated before this hypothesis can be confirmed:
- surrogate-model-amortized-optimization-foundations
- neural-operator-parametrized-pde-benchmarks
- active-learning-bayesian-optimization-convergence-guarantees
- uncertainty-quantification-deep-ensembles-reliability
Implementation Sketch
# Adaptive Sampling for Surrogate-Based Amortized Optimization # Architecture Overview import numpy as np from scipy.integrate import solve_ivp import torch import gpytorch from botorch.models import SingleTaskGP from botorch.acquisition import qExpectedImprovement, MaxVariance from botorch.fit import fit_gpytorch_mll # ── 1. PARAMETRIZED DYNAMICAL SYSTEM INTERFACE ────────────────────────────── class ParametrizedDynamicalSystem: """Abstract interface for parametrized ODE/PDE systems.""" def __init__(self, param_bounds: dict): self.param_bounds = param_bounds # {name: (low, high)} def simulate(self, params: np.ndarray) -> np.ndarray: """Run high-fidelity simulator; returns QoI scalar or vector.""" raise NotImplementedError def param_dim(self) -> int: return len(self.param_bounds) class LorenzSystem(ParametrizedDynamicalSystem): def simulate(self, params): sigma, rho, beta = params def lorenz(t, y): return [sigma*(y[1]-y[0]), y[0]*(rho-y[2])-y[1], y[0]*y[1]-beta*y[2]] sol = solve_ivp(lorenz, [0, 50], [1,1,1], dense_output=False, t_eval=np.linspace(40,50,100)) return np.array([np.mean(sol.y[0]**2)]) # QoI: mean energy # ── 2. SAMPLING STRATEGIES ─────────────────────────────────────────────────── class SamplingStrategy: def select_next_query(self, X_observed, y_observed, surrogate, bounds): raise NotImplementedError class UniformRandomSampling(SamplingStrategy): def select_next_query(self, X_observed, y_observed, surrogate, bounds): d = bounds.shape[1] return np.random.uniform(bounds[0], bounds[1], size=(1, d)) class AdaptiveGPSampling(SamplingStrategy): """Max-variance (uncertainty) based adaptive sampling.""" def select_next_query(self, X_observed, y_observed, surrogate, bounds): # Optimize acquisition function (max posterior variance) from botorch.optim import optimize_acqf acq = MaxVariance(surrogate) candidate, _ = optimize_acqf( acq, bounds=torch.tensor(bounds, dtype=torch.float64), q=1, num_restarts=10, raw_samples=512 ) return candidate.numpy() class EnsembleAdaptiveSampling(SamplingStrategy): """BALD criterion using deep ensemble disagreement.""" def select_next_query(self, X_observed, y_observed, surrogate, bounds): candidates = np.random.uniform(bounds[0], bounds[1], size=(1000, bounds.shape[1])) X_cand = torch.tensor(candidates, dtype=torch.float32) preds = torch.stack([m(X_cand) for m in surrogate.members]) # (n_members, N, 1) variance = preds.var(dim=0).squeeze() best_idx = variance.argmax().item() return candidates[best_idx:best_idx+1] # ── 3. SURROGATE MODEL ─────────────────────────────────────────────────────── class GPSurrogate: def __init__(self): self.model = None self.likelihood = None def fit(self, X: np.ndarray, y: np.ndarray): X_t = torch.tensor(X, dtype=torch.float64) y_t = torch.tensor(y.squeeze(), dtype=torch.float64) self.likelihood = gpytorch.likelihoods.GaussianLikelihood() self.model = SingleTaskGP(X_t, y_t.unsqueeze(-1)) mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.likelihood, self.model) fit_gpytorch_mll(mll) def predict(self, X: np.ndarray): X_t = torch.tensor(X, dtype=torch.float64) self.model.eval() with torch.no_grad(), gpytorch.settings.fast_pred_var(): pred = self.model(X_t) return pred.mean.numpy(), pred.variance.numpy() def rmse(self, X_test, y_test): y_pred, _ = self.predict(X_test) return np.sqrt(np.mean((y_pred - y_test.squeeze())**2)) # ── 4. AMORTIZED OPTIMIZATION ──────────────────────────────────────────────── def amortized_optimize(surrogate, bounds, n_restarts=50): """Optimize objective over parameter space using surrogate.""" from scipy.optimize import minimize best_val, best_x = np.inf, None for _ in range(n_restarts): x0 = np.random.uniform(bounds[0], bounds[1]) def neg_surrogate(x): mu, _ = surrogate.predict(x.reshape(1,-1)) return mu[0] res = minimize(neg_surrogate, x0, method='L-BFGS-B', bounds=list(zip(bounds[0], bounds[1]))) if res.fun < best_val: best_val, best_x = res.fun, res.x return best_x, best_val # ── 5. MAIN EXPERIMENT LOOP ────────────────────────────────────────────────── def run_experiment(system, strategy, budgets, n_trials=10, n0=100): results = {b: {'rmse': [], 'opt_gap': [], 'wall_time': []} for b in budgets} # Ground truth optimum (expensive, computed once) grid = np.random.uniform(system.param_bounds_array[0], system.param_bounds_array[1], size=(5000, system.param_dim())) y_grid = np.array([system.simulate(x) for x in grid]) true_opt_val = y_grid.min() # Test set for RMSE evaluation X_test = np.random.uniform(system.param_bounds_array[0], system.param_bounds_array[1], size=(500, system.param_dim())) y_test = np.array([system.simulate(x) for x in X_test]) for trial in range(n_trials): np.random.seed(trial) # Warm-start with LHS (identical across strategies) from scipy.stats.qmc import LatinHypercube sampler = LatinHypercube(d=system.param_dim()) X = sampler.random(n=n0) X = X * (system.param_bounds_array[1] - system.param_bounds_array[0]) + system.param_bounds_array[0] y = np.array([system.simulate(x) for x in X]) surrogate = GPSurrogate() for budget in sorted(budgets): # Adaptive refinement to reach budget while len(X) < budget: surrogate.fit(X, y) x_next = strategy.select_next_query(X, y, surrogate.model, system.param_bounds_array) y_next = system.simulate(x_next.squeeze()) X = np.vstack([X, x_next]) y = np.append(y, y_next) surrogate.fit(X, y) rmse = surrogate.rmse(X_test, y_test) _, opt_val = amortized_optimize(surrogate, system.param_bounds_array) opt_gap = abs(opt_val - true_opt_val) / (abs(true_opt_val) + 1e-8) * 100 results[budget]['rmse'].append(rmse) results[budget]['opt_gap'].append(opt_gap) return results # ── 6. STATISTICAL ANALYSIS ────────────────────────────────────────────────── def analyze_results(adaptive_results, uniform_results, budgets): from scipy.stats import wilcoxon for b in budgets: rmse_adaptive = adaptive_results[b]['rmse'] rmse_uniform = uniform_results[b]['rmse'] stat, p = wilcoxon(rmse_adaptive, rmse_uniform) savings = (1 - np.mean(rmse_adaptive)/np.mean(rmse_uniform)) * 100 print(f"Budget {b}: RMSE savings={savings:.1f}%, p={p:.4f}") # Find simulator calls to threshold for strategy_name, res in [('adaptive', adaptive_results), ('uniform', uniform_results)]: for b in budgets: if np.mean(res[b]['rmse']) <= 0.05 and np.mean(res[b]['opt_gap']) <= 2.0: print(f"{strategy_name}: threshold reached at budget={b}") break
- Checkpoint 1 (Day 5, after N_0=100 warm-start): If surrogate RMSE on test set > 0.5 for all strategies on Lorenz system, the surrogate architecture is inappropriate for this system — abort and revise surrogate choice before proceeding.
- Checkpoint 2 (Day 10, budget=150): If adaptive sampling RMSE is ≥ 95% of uniform sampling RMSE across all 3 systems (no improvement signal), abort full experiment — hypothesis likely false for chosen systems/strategies; revise adaptive criterion.
- Checkpoint 3 (Day 15, budget=200): If query selection overhead (adaptive) exceeds 30% of total wall-clock time for d ≤ 5, abort and switch to cheaper acquisition function (random subspace optimization or Thompson sampling).
- Checkpoint 4 (Day 20, after 10 trials on System 1): If trial-to-trial variance in RMSE is > 50% of mean RMSE (CV > 0.5), results are too noisy for statistical conclusions — abort and increase N_0 or reduce surrogate complexity.
- Checkpoint 5 (Day 30, mid-full-validation): If LHS baseline matches adaptive on all 6 systems within 5% on all metrics, abort full validation — adaptive adds no value beyond smart initialization; reframe hypothesis.
- Checkpoint 6 (Day 40, before final analysis): If statistical tests show p > 0.2 for all system-strategy pairs, abort statistical analysis and report null result — do not proceed to commercial/deployment recommendations.