Metamodeling for Computational Efficiency: 99% Speedup via Polynomial Chaos Expansions

The Computational Bottleneck

Value of Information (VoI) analysis is the gold standard for quantifying the ROI of data before you collect it—critical for multi-million dollar sensor network investments. But there's a problem: it's computationally intractable.

The mathematics requires calculating a nested expectation (double-loop Monte Carlo):

E_Cost = E_X[ E_θ[ Cost(X, θ) ] ]

Where: - Outer loop (E_X): Average over all possible future measurements X (thousands of scenarios) - Inner loop (E_θ): For each measurement, calculate expected cost over all model uncertainties θ (thousands more samples)

Result: A single VoI calculation requires tens of millions of model evaluations, taking 8-12 hours on HPC clusters. This makes VoI: - Impractical for sensitivity analysis (would need thousands of VoI calculations) - Unusable for real-time decision support - Limited to academic toy problems rather than operational tools

I broke VoI out of its computational prison using advanced metamodeling (surrogate modeling) strategies, achieving >99% speedup while preserving R² > 0.999 accuracy.

Impact: Enabled the large-scale Global Sensitivity Analysis that revealed model bias as the dominant driver of information value (see Model Risk project), transforming VoI from theoretical framework to practical decision tool.

Metamodeling Fundamentals

Core idea: Replace expensive computational model with cheap-to-evaluate surrogate that approximates the input-output relationship.

Workflow: 1. Design of Experiments: Select small set of training points (100-500) using space-filling designs 2. High-Fidelity Evaluation: Run expensive model at training points only 3. Surrogate Training: Fit lightweight metamodel to training data 4. Rapid Prediction: Evaluate surrogate millions of times (microseconds each) instead of running expensive model

Key requirement: Surrogate must accurately capture model behavior across the input space with minimal training data.

Why This Matters for ML

Metamodeling addresses a fundamental ML challenge: what do you do when model evaluation is prohibitively expensive?

Traditional ML context: Training data is cheap (labeled examples), model prediction is fast (forward pass through neural network).

Scientific ML / Simulation-based contexts: Generating training data is expensive (each label requires running a physics simulation), surrogate prediction must be extremely fast to enable downstream applications.

Applications: - Hyperparameter tuning for expensive models (neural architecture search) - Simulation-based inference (likelihood-free Bayesian inference) - Physics-informed neural networks (PINNs) where physics simulation is the bottleneck - Active learning with expensive oracles

Solution 1: Polynomial Chaos Expansion (PCE) - The Winner

Spectral Method Fundamentals

PCE represents the model output as a weighted sum of orthonormal polynomials:

Y(θ) ≈ Σ α_i Ψ_i(θ)

Where: - Y(θ): Model output (e.g., expected cost in VoI framework) - θ: Input random variables (uncertain parameters) - Ψ_i(θ): Hermite polynomials (orthonormal basis for Gaussian-distributed inputs) - α_i: Coefficients to be determined

Why Hermite Polynomials?

For Gaussian-distributed inputs, Hermite polynomials form an orthonormal basis with respect to the Gaussian probability measure:

E[ Ψ_i(θ) · Ψ_j(θ) ] = δ_ij (Kronecker delta: 1 if i=j, 0 otherwise)

This orthogonality provides two critical advantages:

1. Analytical Expected Value:

E[Y] = E[Σ α_i Ψ_i(θ)] = Σ α_i E[Ψ_i(θ)] = α_0

The expected value is simply the first coefficient α_0! This eliminates the need for the inner Monte Carlo loop entirely—a massive computational savings.

2. Efficient Coefficient Estimation:

Using the orthogonality property:

α_i = E[Y(θ) · Ψ_i(θ)]

We can estimate coefficients via regression on a small training set:

# Training data: (θ_1, y_1), (θ_2, y_2), ..., (θ_N, y_N)
# N = 100-500 samples (far fewer than Monte Carlo would need)

# Build design matrix
Psi = [[Ψ_0(θ_1), Ψ_1(θ_1), ..., Ψ_P(θ_1)],
       [Ψ_0(θ_2), Ψ_1(θ_2), ..., Ψ_P(θ_2)],
       ...
       [Ψ_0(θ_N), Ψ_1(θ_N), ..., Ψ_P(θ_N)]]

# Solve least-squares regression
alpha = (Psi.T @ Psi)^(-1) @ Psi.T @ y

Implementation: Design of Experiments

Challenge: How to select training points for maximum information with minimal samples?

Solution: Sobol' sequences (quasi-random, low-discrepancy sampling).

Unlike pseudo-random sampling (which clusters and leaves gaps), Sobol' sequences systematically fill the input space, providing better coverage with fewer points:

import numpy as np
from scipy.stats.qmc import Sobol

# Generate 100 quasi-random samples in 5D space
sampler = Sobol(d=5, scramble=True)
theta_train = sampler.random(n=100)

# Run expensive model at these 100 points
y_train = np.array([expensive_model(t) for t in theta_train])

# Fit PCE surrogate
pce_model = fit_polynomial_chaos(theta_train, y_train, degree=3)

# Now evaluate millions of times cheaply
theta_test = sampler.random(n=1000000)
y_pred = pce_model.predict(theta_test)  # Microseconds total!

Results: Spectacular Success

Accuracy: R² = 0.999 with only 100 training samples

Speedup: 100-fold reduction in computational cost - Before: 10,000 Monte Carlo samples × 8 hours = massive computational burden - After: 100 training samples × 8 hours = 800 hours one-time cost, then evaluate millions of times in seconds

Downstream enablement: This speedup made the 8,000-iteration Global Sensitivity Analysis feasible (see Model Risk project). Without PCE, that study would have required ~64,000 HPC hours (7+ years of wall-clock time).

Why PCE Succeeded

The underlying VoI cost function had properties ideally suited for spectral methods:

Smooth: No discontinuities or sharp gradients
Low effective dimensionality: Despite many input parameters, output dominated by a few key variables
Polynomial-like structure: Expected costs are sums and products of random variables—naturally approximated by polynomial expansions

PCE leverages these properties through the spectral convergence phenomenon: for smooth functions, polynomial approximations converge exponentially fast as polynomial degree increases.

Solution 2: Kriging (Gaussian Process Regression) - The Instructive Failure

Method Overview

Kriging (also known as Gaussian Process Regression in the ML community) is a non-parametric approach modeling the output as a realization of a Gaussian Process:

Y(θ) ~ GP(μ(θ), k(θ, θ'))

Where: - μ(θ): Mean function (often assumed constant) - k(θ, θ'): Covariance function (kernel) controlling smoothness

Prediction at new point θ*:

ŷ(θ) = μ + k · K^(-1) · (y_train - μ)

Where: - k: Covariance between θ and training points - K: Covariance matrix between training points - y_train: Observed outputs at training points

Implementation Details

I implemented Ordinary Kriging with exponential variogram:

γ(h) = σ² [1 - exp(-3h/a)]

Where: - h: Distance between points - σ²: Sill (total variance) - a: Range (correlation length scale)

Variogram parameters estimated via maximum likelihood.

The Failure: Stationarity Violation

Result: Kriging performed poorly, with R² < 0.7 and systematic prediction errors.

Root cause analysis: The underlying VoI cost function violated Kriging's second-order stationarity assumption.

Second-order stationarity requires: 1. Constant mean: E[Y(θ)] = μ for all θ 2. Stationary covariance: Cov(Y(θ), Y(θ')) depends only on ||θ - θ'||, not absolute positions

Actual function behavior: The VoI cost surface had a distinct bilinear trend: - High costs in one region of parameter space (high-risk scenarios requiring expensive interventions) - Low costs in another region (low-risk scenarios with minimal action needed) - Clear directional gradient violating the constant mean assumption

Visual diagnosis: Semivariogram plot showed: - Theoretical exponential model (smooth, saturating curve) - Actual empirical semivariogram (non-stationary, trending behavior) - Clear mismatch indicating model assumption failure

Why This Failure Is Valuable

Many practitioners blindly apply "off-the-shelf" ML methods without validating assumptions. This project demonstrates:

Critical method selection: Understanding when and why methods work, not just how to run the code
Assumption validation: Testing core assumptions (stationarity, smoothness, etc.) before trusting predictions
Diagnostic rigor: Using theoretical plots (semivariograms) to identify failure modes
Fallback strategies: Having alternative approaches (PCE) when first choice fails

ML parallel: Blindly applying Gaussian Processes for all surrogate modeling is like using deep neural networks for all prediction tasks—sometimes simpler methods (linear regression, polynomial regression) outperform complex ones when data structure suits them better.

Computational Impact: Enabling Large-Scale Studies

The Domino Effect

PCE metamodeling didn't just speed up VoI—it enabled entirely new classes of analysis:

Before metamodeling: - Single VoI calculation: 8-12 hours - Sensitivity analysis requiring 8,000 VoI calculations: IMPOSSIBLE

After PCE metamodeling: - 100 expensive model evaluations (one-time cost): 12 hours - 8,000 cheap PCE evaluations: 4 hours total - Net result: Made the impossible feasible

This cascading efficiency gain enabled the Model Risk project's key findings (model bias > sensor precision), which have implications for responsible AI deployment.

HPC Implementation

I deployed the metamodeling workflow on SLURM-managed Linux HPC clusters:

#!/bin/bash
#SBATCH --job-name=pce_metamodel
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=16
#SBATCH --time=12:00:00

# Parallel training point evaluation
srun python evaluate_expensive_model.py --samples 100

# Fit PCE surrogate (cheap, single-core)
python fit_pce_surrogate.py --degree 3 --training-data results/

# Validate with holdout set
python validate_surrogate.py --holdout 20

The parallelization strategy: embarrassingly parallel expensive model evaluations (each training point independent), sequential cheap surrogate fitting.

Skills Demonstrated: Surrogate Modeling & UQ

Surrogate Modeling Techniques: - Polynomial Chaos Expansion (spectral methods) - Gaussian Processes / Kriging (stochastic methods) - Design of experiments (Sobol' sequences, space-filling) - Method selection and validation

Uncertainty Quantification: - Propagation of uncertainty through complex models - Statistical moments from surrogate (mean, variance, sensitivity indices) - Convergence analysis and error estimation

Computational Efficiency: - Algorithmic acceleration strategies - HPC deployment (SLURM, parallelization) - Trade-offs between training cost and prediction speed

Critical Analysis: - Assumption validation (stationarity testing) - Diagnostic tools (semivariograms, residual plots) - Comparative method evaluation (PCE vs Kriging) - Understanding failure modes

Broader Applications: Beyond VoI

Metamodeling is increasingly critical for modern ML and scientific computing:

Neural Architecture Search (NAS): Surrogate models predict neural network performance without full training, accelerating architecture optimization by 100x.

Bayesian Optimization: Gaussian Processes guide hyperparameter search, reducing expensive model evaluations needed.

Simulation-Based Inference: PCE and GPs enable likelihood-free Bayesian inference when likelihood functions are intractable but simulations are possible.

Physics-Informed Neural Networks (PINNs): Surrogate models replace expensive PDE solvers, enabling real-time predictions.

Active Learning: Metamodels predict where to sample next for maximum information gain, reducing labeling costs.

Technical Specifications

PCE Implementation: - Hermite polynomial basis (order 2-4 typical) - Regression-based coefficient estimation - Sobol' sequence sampling (quasi-random) - Python + NumPy + SciPy

Kriging Implementation: - Ordinary Kriging (constant mean assumption) - Exponential variogram model - Maximum likelihood parameter estimation - MATLAB + UQLab toolbox

Validation: - R² coefficient of determination - Mean Squared Error (MSE) - Cross-validation (k-fold, holdout) - Convergence studies (varying training set size)

Computational Environment: - Linux HPC cluster - SLURM job scheduler - 10-50 nodes, 160-800 cores - ~500-2000 core-hours total

Code Repository: Academic research code available upon request (Python + MATLAB)

For questions about surrogate modeling or computational efficiency strategies, contact me.

Related Projects: - Model Risk & Decision Support - VoI framework enabled by this metamodeling work - Scientific Computing Framework - JAX-based optimization for scientific ML - Parameter Optimization - Another optimization context benefiting from computational efficiency