Machine Learning for Predicting Steel Corrosion in Oil & Gas Infrastructure

Project Overview

For an oil and gas industry client, I developed machine learning models to predict the remaining service life of steel assets (storage tanks, offshore platforms) by forecasting corrosion initiation. Accurate prediction of when protective coatings fail and corrosion begins is critical for optimizing inspection schedules and preventing catastrophic failures in high-value infrastructure.

Impact: Delivered 90% accurate ML model enabling significant cost savings through data-driven inspection scheduling and maintenance planning.

The Challenge

Business Problem: Steel corrosion is the primary driver of maintenance costs and safety risks in oil & gas operations. Traditional approaches rely on conservative, time-based inspection schedules that either waste resources (inspecting too often) or miss failures (inspecting too late).

Technical Challenge: - Multiple disparate experimental datasets from different times, locations, and protocols - Mix of numerical and categorical features requiring sophisticated engineering - Uncertainty quantification needed for risk management (not just point predictions)

Technical Approach

1. Data Engineering

Challenge: Merge and standardize datasets with inconsistent collection methodologies

Solution: - Cleaned and harmonized multi-source experimental data - Standardized variable definitions and units across datasets - Handled missing values and outliers with domain-informed strategies

2. Feature Engineering

Numerical Features: - Environmental conditions (temperature, humidity, salinity) - Material properties (steel grade, coating thickness) - Operational parameters (exposure duration, stress levels)

Categorical Features: - Coating type and application method - Environmental zone classification - Steel composition categories

Feature Selection: Sensitivity analysis (SALib) for model reduction, identifying most influential predictors

3. Model Development

Algorithm: Bayesian regression for uncertainty quantification

Why Bayesian: - Provides prediction intervals, not just point estimates - Quantifies uncertainty in corrosion initiation time - Enables risk-based decision-making for inspection scheduling

Tools: Python, Scikit-learn, SALib (sensitivity analysis)

Validation: Cross-validation on held-out experimental datasets

Key Achievements

90% Prediction Accuracy: Model correctly forecasts corrosion initiation timing
Significant Cost Savings: Optimized inspection scheduling based on data-driven predictions rather than conservative fixed intervals
Uncertainty Quantification: Bayesian approach provides confidence intervals for risk management
Actionable Insights: Sensitivity analysis identified which factors most influence corrosion, informing material selection and coating strategies

Data Visualization

Synthesized findings into high-quality visualizations for client decision-making: - Feature importance rankings - Prediction intervals vs. observed corrosion times - Sensitivity analysis results showing dominant risk factors - Scenario analysis for different environmental conditions

Business Impact

The model enabled the client to: - Reduce inspection costs by targeting high-risk assets and extending intervals for low-risk ones - Prevent failures by identifying critical inspection windows with quantified confidence - Optimize maintenance budgets by allocating resources based on predicted remaining service life - Improve safety through data-driven risk assessment

This project demonstrated the power of combining domain expertise (corrosion science) with modern ML techniques (Bayesian regression, sensitivity analysis) to solve high-stakes industrial problems.

Detailed case study coming soon.

For questions about this project, please contact me.