solver.press

Adaptive sampling in parametrized dynamical systems can improve the efficiency of surrogate-based amortized optimization when problem parameters vary continuously.

PhysicsMar 7, 2026Evaluation Score: 63%

Adversarial Debate Score

63% survival rate under critique

Model Critiques

google: The hypothesis is reasonable and falsifiable, especially given the "Uncertainty-Aware Calculation" paper, but the provided excerpts don't offer strong direct support for *amortized* optimization specifically. The other papers are tangentially related to optimization efficiency.
openai: The hypothesis is broadly plausible and falsifiable (compare surrogate-based amortized optimization with/without adaptive sampling over continuously varying parameters), and the reduced-order-model paper directly supports the “adaptive sampling for parametrized dynamical systems improves optimiza...
anthropic: The hypothesis is supported by the adaptive sampling paper on parametrized dynamical systems with model order reduction, which directly demonstrates efficiency gains through targeted sampling, but the connection to "surrogate-based amortized optimization" specifically is tenuous—the relevant pape...

Supporting Research Papers

Formal Verification

Z3 logical consistency:✅ Consistent

Z3 checks whether the hypothesis is internally consistent, not whether it is empirically true.

Experimental Validation Package

This discovery has a Claude-generated validation package with a full experimental design.

Precise Hypothesis

Adaptive sampling strategies applied to the parameter space of parametrized dynamical systems will reduce the total number of high-fidelity simulator evaluations required to achieve a fixed surrogate model accuracy (measured by normalized RMSE ≤ 0.05) compared to uniform/random sampling baselines, when used within a surrogate-based amortized optimization pipeline and when problem parameters vary continuously over a compact domain. Specifically, adaptive sampling will achieve equivalent optimization quality (within 2% of optimal objective value) using ≤ 50% of the simulator calls required by uniform sampling.

Disproof criteria:
  1. Adaptive sampling requires ≥ 90% of the simulator calls of uniform sampling to achieve equivalent surrogate RMSE (≤ 0.05) across 3+ benchmark systems — no meaningful efficiency gain.
  2. Optimization quality under adaptive-surrogate pipeline is statistically worse (p < 0.05, paired t-test) than uniform-surrogate pipeline at matched simulator budgets.
  3. Adaptive sampling overhead (query selection time) exceeds 20% of total wall-clock time, negating computational savings.
  4. Surrogate trained on adaptive samples exhibits higher variance in optimization outcomes (std > 2× that of uniform baseline) across 10 independent runs, indicating instability.
  5. On ≥ 2 of 3 benchmark dynamical systems, adaptive sampling provides < 10% reduction in simulator calls at any fixed accuracy threshold.
  6. A simple heuristic (e.g., Latin Hypercube Sampling) matches or outperforms adaptive sampling on all benchmarks.

Experimental Protocol

Minimum Viable Test (MVT): Compare adaptive sampling (uncertainty-guided, e.g., active learning with GP or ensemble) vs. uniform random sampling vs. Latin Hypercube Sampling (LHS) as surrogate training data acquisition strategies on 3 parametrized dynamical systems of increasing complexity. Evaluate surrogate accuracy and downstream optimization quality as a function of simulator call budget. Run 10 independent trials per condition. Primary metric: simulator calls to reach RMSE ≤ 0.05 and optimization gap ≤ 2%.

Full Validation: Extend to 6 benchmark systems, 5 adaptive strategies, ablation over parameter space dimensionality (d = 2, 5, 10, 20), and real-world physics simulators (e.g., fluid dynamics, epidemiological models). Include wall-clock time profiling and scalability analysis.

Required datasets:
  1. Lorenz-63 system (parametrized by σ, ρ, β ∈ continuous ranges) — synthetic, self-generated via scipy/Julia DifferentialEquations.jl.
  2. Van der Pol oscillator (parametrized by damping coefficient μ ∈ [0.1, 5.0]) — synthetic.
  3. Parametrized 2D Navier-Stokes (viscosity ν, forcing amplitude f) — available via FEniCS or neuraloperator benchmark suite (https://github.com/neuraloperator/neuraloperator).
  4. SIR epidemiological model (β_infection, γ_recovery ∈ continuous ranges) — synthetic.
  5. Duffing oscillator (nonlinearity parameter, damping) — synthetic.
  6. (Full validation) OpenFOAM or DOLFIN-based PDE benchmark with ≥ 5 continuous parameters.
  7. Pre-trained neural operator checkpoints (FNO, DeepONet) from public repositories for transfer learning baselines.
  8. GPyTorch / BoTorch for GP-based adaptive sampling implementation.
  9. PyTorch Ensemble models for deep ensemble UQ baseline.
Success:
  1. Adaptive sampling achieves RMSE ≤ 0.05 using ≤ 50% of simulator calls vs. uniform sampling on ≥ 2 of 3 benchmark systems (primary criterion).
  2. Optimization gap ≤ 2% achieved by adaptive strategy at ≤ 60% of the simulator budget required by uniform sampling on ≥ 2 of 3 systems.
  3. Statistical significance: p < 0.05 (Wilcoxon signed-rank) for simulator call savings on ≥ 2 systems.
  4. Adaptive query selection overhead < 5% of total wall-clock time for d ≤ 10.
  5. Results replicate across ≥ 8 of 10 independent trials (80% replication rate).
  6. Efficiency gain is monotonically increasing with simulator cost-to-surrogate-inference ratio (correlation r > 0.7).
Failure:
  1. Adaptive sampling requires > 80% of uniform sampling budget to reach RMSE ≤ 0.05 on all 3 benchmark systems.
  2. Optimization gap under adaptive pipeline exceeds 5% at any matched budget where uniform achieves ≤ 2%.
  3. Query selection overhead exceeds 20% of total wall-clock time for d ≤ 10.
  4. Results are not reproducible: > 3 of 10 trials show qualitatively different outcomes (e.g., adaptive worse than uniform).
  5. LHS alone matches adaptive sampling performance within 5% on all metrics — adaptive adds no value over smart initialization.
  6. Surrogate RMSE fails to decrease monotonically with budget for adaptive strategy (indicating instability) in ≥ 2 systems.

480

GPU hours

45d

Time to result

$1,200

Min cost

$8,500

Full cost

ROI Projection

Commercial:
  1. Engineering simulation software (ANSYS, Siemens, Dassault): adaptive surrogate training modules could be integrated as premium features, estimated market value $10M–$50M in licensing.
  2. Pharmaceutical/biotech: parametrized ODE models for drug PK/PD optimization — 50% reduction in simulation costs for clinical trial design optimization.
  3. Climate/weather modeling: adaptive sampling for parametrized climate models reduces ensemble simulation costs for uncertainty quantification.
  4. Autonomous systems: real-time adaptive surrogate updating for robotics and control systems operating in varying environments.
  5. Financial modeling: parametrized stochastic differential equations for option pricing and risk optimization — faster calibration pipelines.
  6. Defense/aerospace: aerodynamic shape optimization with CFD surrogates — direct cost reduction in design cycles.
  7. Estimated total addressable market for adaptive surrogate optimization tools: $200M–$2B over 10 years across engineering simulation verticals.
Research:
  1. Direct computational savings: 50% reduction in simulator calls translates to 50% reduction in HPC costs for surrogate training pipelines. For a typical engineering optimization workflow consuming $100K/year in simulation costs, this yields $50K/year savings per project.
  2. Accelerated design cycles: reducing surrogate training time from weeks to days enables 3–5× more design iterations per project timeline.
  3. Democratization: smaller labs with limited HPC budgets can access surrogate-based optimization previously requiring large compute clusters.
  4. Estimated aggregate research impact: if adopted across 1,000 active computational physics/engineering groups, potential savings of $50M–$500M/year in simulation costs globally.
  5. Publication impact: expected citations in top venues (NeurIPS, ICML, ICLR, Journal of Computational Physics) — estimated 200–500 citations within 5 years if validated.

🔓 If proven, this unlocks

Proving this hypothesis is a prerequisite for the following downstream discoveries and applications:

  • 1adaptive-sampling-high-dimensional-parameter-spaces
  • 2online-adaptive-surrogate-updating-real-time-control
  • 3multi-fidelity-adaptive-sampling-dynamical-systems
  • 4amortized-optimization-distribution-shift-robustness
  • 5physics-informed-active-learning-pde-constrained-optimization

Prerequisites

These must be validated before this hypothesis can be confirmed:

Implementation Sketch

# Adaptive Sampling for Surrogate-Based Amortized Optimization
# Architecture Overview

import numpy as np
from scipy.integrate import solve_ivp
import torch
import gpytorch
from botorch.models import SingleTaskGP
from botorch.acquisition import qExpectedImprovement, MaxVariance
from botorch.fit import fit_gpytorch_mll

# ── 1. PARAMETRIZED DYNAMICAL SYSTEM INTERFACE ──────────────────────────────
class ParametrizedDynamicalSystem:
    """Abstract interface for parametrized ODE/PDE systems."""
    def __init__(self, param_bounds: dict):
        self.param_bounds = param_bounds  # {name: (low, high)}
    
    def simulate(self, params: np.ndarray) -> np.ndarray:
        """Run high-fidelity simulator; returns QoI scalar or vector."""
        raise NotImplementedError
    
    def param_dim(self) -> int:
        return len(self.param_bounds)

class LorenzSystem(ParametrizedDynamicalSystem):
    def simulate(self, params):
        sigma, rho, beta = params
        def lorenz(t, y):
            return [sigma*(y[1]-y[0]), y[0]*(rho-y[2])-y[1], y[0]*y[1]-beta*y[2]]
        sol = solve_ivp(lorenz, [0, 50], [1,1,1], dense_output=False, 
                        t_eval=np.linspace(40,50,100))
        return np.array([np.mean(sol.y[0]**2)])  # QoI: mean energy

# ── 2. SAMPLING STRATEGIES ───────────────────────────────────────────────────
class SamplingStrategy:
    def select_next_query(self, X_observed, y_observed, surrogate, bounds):
        raise NotImplementedError

class UniformRandomSampling(SamplingStrategy):
    def select_next_query(self, X_observed, y_observed, surrogate, bounds):
        d = bounds.shape[1]
        return np.random.uniform(bounds[0], bounds[1], size=(1, d))

class AdaptiveGPSampling(SamplingStrategy):
    """Max-variance (uncertainty) based adaptive sampling."""
    def select_next_query(self, X_observed, y_observed, surrogate, bounds):
        # Optimize acquisition function (max posterior variance)
        from botorch.optim import optimize_acqf
        acq = MaxVariance(surrogate)
        candidate, _ = optimize_acqf(
            acq, bounds=torch.tensor(bounds, dtype=torch.float64),
            q=1, num_restarts=10, raw_samples=512
        )
        return candidate.numpy()

class EnsembleAdaptiveSampling(SamplingStrategy):
    """BALD criterion using deep ensemble disagreement."""
    def select_next_query(self, X_observed, y_observed, surrogate, bounds):
        candidates = np.random.uniform(bounds[0], bounds[1], size=(1000, bounds.shape[1]))
        X_cand = torch.tensor(candidates, dtype=torch.float32)
        preds = torch.stack([m(X_cand) for m in surrogate.members])  # (n_members, N, 1)
        variance = preds.var(dim=0).squeeze()
        best_idx = variance.argmax().item()
        return candidates[best_idx:best_idx+1]

# ── 3. SURROGATE MODEL ───────────────────────────────────────────────────────
class GPSurrogate:
    def __init__(self):
        self.model = None
        self.likelihood = None
    
    def fit(self, X: np.ndarray, y: np.ndarray):
        X_t = torch.tensor(X, dtype=torch.float64)
        y_t = torch.tensor(y.squeeze(), dtype=torch.float64)
        self.likelihood = gpytorch.likelihoods.GaussianLikelihood()
        self.model = SingleTaskGP(X_t, y_t.unsqueeze(-1))
        mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.likelihood, self.model)
        fit_gpytorch_mll(mll)
    
    def predict(self, X: np.ndarray):
        X_t = torch.tensor(X, dtype=torch.float64)
        self.model.eval()
        with torch.no_grad(), gpytorch.settings.fast_pred_var():
            pred = self.model(X_t)
        return pred.mean.numpy(), pred.variance.numpy()
    
    def rmse(self, X_test, y_test):
        y_pred, _ = self.predict(X_test)
        return np.sqrt(np.mean((y_pred - y_test.squeeze())**2))

# ── 4. AMORTIZED OPTIMIZATION ────────────────────────────────────────────────
def amortized_optimize(surrogate, bounds, n_restarts=50):
    """Optimize objective over parameter space using surrogate."""
    from scipy.optimize import minimize
    best_val, best_x = np.inf, None
    for _ in range(n_restarts):
        x0 = np.random.uniform(bounds[0], bounds[1])
        def neg_surrogate(x):
            mu, _ = surrogate.predict(x.reshape(1,-1))
            return mu[0]
        res = minimize(neg_surrogate, x0, method='L-BFGS-B',
                      bounds=list(zip(bounds[0], bounds[1])))
        if res.fun < best_val:
            best_val, best_x = res.fun, res.x
    return best_x, best_val

# ── 5. MAIN EXPERIMENT LOOP ──────────────────────────────────────────────────
def run_experiment(system, strategy, budgets, n_trials=10, n0=100):
    results = {b: {'rmse': [], 'opt_gap': [], 'wall_time': []} for b in budgets}
    
    # Ground truth optimum (expensive, computed once)
    grid = np.random.uniform(system.param_bounds_array[0], 
                             system.param_bounds_array[1], size=(5000, system.param_dim()))
    y_grid = np.array([system.simulate(x) for x in grid])
    true_opt_val = y_grid.min()
    
    # Test set for RMSE evaluation
    X_test = np.random.uniform(system.param_bounds_array[0],
                               system.param_bounds_array[1], size=(500, system.param_dim()))
    y_test = np.array([system.simulate(x) for x in X_test])
    
    for trial in range(n_trials):
        np.random.seed(trial)
        # Warm-start with LHS (identical across strategies)
        from scipy.stats.qmc import LatinHypercube
        sampler = LatinHypercube(d=system.param_dim())
        X = sampler.random(n=n0)
        X = X * (system.param_bounds_array[1] - system.param_bounds_array[0]) + system.param_bounds_array[0]
        y = np.array([system.simulate(x) for x in X])
        
        surrogate = GPSurrogate()
        
        for budget in sorted(budgets):
            # Adaptive refinement to reach budget
            while len(X) < budget:
                surrogate.fit(X, y)
                x_next = strategy.select_next_query(X, y, surrogate.model, 
                                                    system.param_bounds_array)
                y_next = system.simulate(x_next.squeeze())
                X = np.vstack([X, x_next])
                y = np.append(y, y_next)
            
            surrogate.fit(X, y)
            rmse = surrogate.rmse(X_test, y_test)
            _, opt_val = amortized_optimize(surrogate, system.param_bounds_array)
            opt_gap = abs(opt_val - true_opt_val) / (abs(true_opt_val) + 1e-8) * 100
            
            results[budget]['rmse'].append(rmse)
            results[budget]['opt_gap'].append(opt_gap)
    
    return results

# ── 6. STATISTICAL ANALYSIS ──────────────────────────────────────────────────
def analyze_results(adaptive_results, uniform_results, budgets):
    from scipy.stats import wilcoxon
    for b in budgets:
        rmse_adaptive = adaptive_results[b]['rmse']
        rmse_uniform = uniform_results[b]['rmse']
        stat, p = wilcoxon(rmse_adaptive, rmse_uniform)
        savings = (1 - np.mean(rmse_adaptive)/np.mean(rmse_uniform)) * 100
        print(f"Budget {b}: RMSE savings={savings:.1f}%, p={p:.4f}")
    
    # Find simulator calls to threshold
    for strategy_name, res in [('adaptive', adaptive_results), ('uniform', uniform_results)]:
        for b in budgets:
            if np.mean(res[b]['rmse']) <= 0.05 and np.mean(res[b]['opt_gap']) <= 2.0:
                print(f"{strategy_name}: threshold reached at budget={b}")
                break
Abort checkpoints:
  1. Checkpoint 1 (Day 5, after N_0=100 warm-start): If surrogate RMSE on test set > 0.5 for all strategies on Lorenz system, the surrogate architecture is inappropriate for this system — abort and revise surrogate choice before proceeding.
  2. Checkpoint 2 (Day 10, budget=150): If adaptive sampling RMSE is ≥ 95% of uniform sampling RMSE across all 3 systems (no improvement signal), abort full experiment — hypothesis likely false for chosen systems/strategies; revise adaptive criterion.
  3. Checkpoint 3 (Day 15, budget=200): If query selection overhead (adaptive) exceeds 30% of total wall-clock time for d ≤ 5, abort and switch to cheaper acquisition function (random subspace optimization or Thompson sampling).
  4. Checkpoint 4 (Day 20, after 10 trials on System 1): If trial-to-trial variance in RMSE is > 50% of mean RMSE (CV > 0.5), results are too noisy for statistical conclusions — abort and increase N_0 or reduce surrogate complexity.
  5. Checkpoint 5 (Day 30, mid-full-validation): If LHS baseline matches adaptive on all 6 systems within 5% on all metrics, abort full validation — adaptive adds no value beyond smart initialization; reframe hypothesis.
  6. Checkpoint 6 (Day 40, before final analysis): If statistical tests show p > 0.2 for all system-strategy pairs, abort statistical analysis and report null result — do not proceed to commercial/deployment recommendations.

Source

AegisMind Research
Need AI to work rigorously on your problems? AegisMind uses the same multi-model engine for personal and professional use. Get started