Roadmap Architecture¶
A Program, Not a Project List¶
The roadmap is a coherent 24-month program, not a catalog of independent projects. The distinction matters:
- Projects share data. Every project that connects a CMMS to a historian, or links quality records to process parameters, contributes to a shared data foundation. The second project in a domain costs a fraction of the first.
- Projects share learnings. A maintenance copilot deployed at Cleveland generates the implementation playbook, change management approach, and integration patterns that accelerate deployment at Middletown and Burns Harbor.
- Projects build on each other. Predictive maintenance requires the ops-maintenance data integration that Horizon 1 projects create. Process optimization requires the data maturity that Horizon 2 projects establish.
The program is organized into three progressive horizons. Each represents a step in capability maturity — from closing basic information gaps, to building cross-site analytics, to deploying predictive and optimization models that run on the data foundation the earlier projects created.
Three Horizons: Progressive Capability Building¶
| Horizon 1 (Months 0-6) | Horizon 2 (Months 7-12) | Horizon 3 (Months 13-24) |
|---|---|---|
| "Bridge the Gap" | "Build the Foundation" | "Predict & Optimize" |
| Close the information loops that are costing money today. | Scale across sites, add analytics and optimization. | Predictive models and closed-loop intelligence. |
| Ops-Maintenance Data Integration | Quality & Surface Intelligence | Production Scheduling & S&IOP |
| Knowledge Capture & Virtual SME | Logistics Optimization | Through-Process Intelligence |
| Maintenance Workflow & Procurement | BF & Process Risk Intelligence | Closed-Loop Process Control |
| PdM Proof of Value (in progress) | Process Chemistry Optimization (in progress) | |
| Coke Plant Battery Vision (Burns Harbor) | Caster & Steelmaking Optimization | |
| Quick Wins: paper-to-digital, safety analytics |
Progressive Data Foundation — each horizon extends the data layer built by the one before it:
- Horizon 1: CMMS + ops reporting + primary historian linked for first time. Knowledge bases seeded. Quick wins create new digital data streams.
- Horizon 2: + quality data + process control + cross-site benchmarking. Databricks platform matures. New datamarts (quality, logistics, process chemistry) stand up.
- Horizon 3: + L1/L2 + full process chain. Real-time predictive and optimization models deployed.
Already Underway¶
The program is not starting from zero. Two projects have been scoped, chartered, and priced in direct response to Chad Asgaard's directives — before the formal roadmap was finalized:
- Cleveland PdM Proof of Value (PRJ-03) — Charter submitted, contract in final approval. Eight-week multi-asset PoV at 1SP targeting BOF bag house (primary), scrubbing system (secondary), and Crane 300 (tertiary), plus a multi-asset data readiness assessment across all reachable 1SP equipment and a data foundation blueprint. ~$236K investment. Chad greenlit March 16 ("get started right away").
- Tilden Concentrator Optimization (PRJ-10) — Charter v3 submitted, Phase 1 priced at $312K firm. Seven-week Proof of Value proving whether existing process signals correlate with recovery outcomes strongly enough to justify the full system ($1.3-1.56M indicative Phase 2). Four-component AI stack around the desliming circuit: desliming optimization model, flotation recovery correlation, beaker test vision, live CRP engine. Chad directed March 24 ("the biggest opportunity... I want to focus high on process optimization").
These two active projects demonstrate the program's prove-then-scale philosophy: bounded investments with hard go/no-go gates, delivering standalone value even if subsequent phases do not proceed.
Both projects are chartered, priced, and awaiting contract execution. Corporate has committed scope and budget; the remaining step is signature. Once the broader program is approved, these two PoVs launch immediately and deliver their first results within 7-8 weeks, informing Gate 0 decisions for the rest of the portfolio.
The Key Insight: Data Centralization Is Not a Separate Project¶
Every Horizon 1 project connects data sources that were never connected before. Every Horizon 2 project adds new data streams and cross-site comparisons. By Horizon 3, the data foundation exists — not because anyone funded a multi-year data platform program, but because each project contributed its piece.
This is critical: we are not asking Cleveland-Cliffs to invest in infrastructure first and wait for returns later. We are proposing solutions that solve immediate operational pain, and the data foundation is the byproduct. The enterprise data platform (Databricks) proceeds in parallel — our architecture is designed to run on it from Day 1.
Technical Foundations¶
The AI techniques vary by project, but five principles apply across the entire portfolio. These are not theoretical commitments; they are operational controls built into every project charter.
Automation Progression¶
Every project follows the same maturity path: advisory, semi-automated, closed-loop. The gate between stages is operator acceptance, not model accuracy. A model that scores 95% precision but that operators do not trust stays in advisory mode. Promotion to semi-automated happens when the operators who use the system in advisory mode confirm that they rely on its recommendations. At roadmap stage, the acceptance criteria are proposed as draft examples only; the final thresholds are set in each project's blueprint. This directly addresses the institutional caution at CLF: the ArcelorMittal 2.5-year failure and Prime Metals experience taught operators to distrust automation that was imposed rather than earned.
Data Quality Gates¶
The first 2-3 weeks of every project is a formal data assessment. If data quality falls below the threshold required for the intended analytical approach, the project pivots to data remediation (instrumentation, digitization, process capture) before modeling begins. No project proceeds on the assumption that the data is ready. At roadmap stage, threshold examples can be proposed, but the final criteria are defined in the project blueprint once the real source data is inspected. This prevents wasted modeling effort and builds credibility with plant teams who know their data gaps better than anyone.
Validation by Backtesting¶
The consistent answer to "how do you know it works" is use-case-specific validation tied to plant outcomes. Predictive models are validated against historical failures, quality events, and process deviations. If the model would have caught the bag house failure that caused last month's shutdown, or flagged the off-chemistry heat that produced the customer claim, the evidence speaks for itself. For retrieval, deduplication, and workflow agents, validation centers on grounded-answer quality, sampled human review precision, and cycle-time or adoption improvement. Backtesting on known incidents remains the primary validation method for predictive use cases before any production deployment.
Explainability¶
Operators will not follow a recommendation they cannot understand. Every predictive model exposes the reasoning behind its output using feature importance analysis (SHAP values). "Temperature trend contributed 60%, vibration amplitude 25%, operating hours 15%." The explanation is expressed in terms operators already understand: sensor readings, process parameters, equipment conditions. For knowledge capture systems (PRJ-09), explainability takes a different form: citation-mandatory responses where every answer must reference its source document.
OT Security: One-Way Data Flow¶
Data flows from OT to cloud only, never cloud to OT. IE's Industrial Data Platform handles extraction across the OT/IT boundary using standard industrial patterns (historian replication, DMZ). Vooban's AI layer has read-only access to replicated data in the cloud. No H1 model, agent, or dashboard writes directly back to control systems. This is fundamental to the ISA-95 positioning: the AI layer operates above the plant floor as a cloud analytics layer, consuming data from L0-L1 and pushing predictions to L4 (enterprise/EAM), not directly to L2 (control) or L3 (operations execution). For projects such as PRJ-10, the initial operating mode is human-facing recommendation; the intent is to avoid closing the control loop too early, and any tighter integration is a later control decision subject to OT/security approval.
MLOps and Model Lifecycle¶
CLF is evaluating enterprise data platforms (Databricks, Snowflake, Fabric) at the corporate level. MLOps tooling depends heavily on which platform lands. H1 projects are designed for cross-site portability across the Cleveland-Cliffs footprint: standard Python, containerized training, open model formats, and reusable data/model assets. The implementation is not fully portable across platforms; orchestration, serving, and transformation details still depend on the selected stack. Centralized model monitoring, automated retraining pipelines, and drift detection are scoped for H2 once the platform decision is made and enough models are in production to justify the investment. The data strategy (companion document) details the ML model lifecycle per platform option.
Milestone Gate Structure¶
The program has two defined checkpoints where corporate leadership evaluates progress and decides whether to continue, redirect, or stop. This is not "sign a two-year contract." It is "commit to six months, then evaluate."
Gate 0 — Program Launch (Month 0)¶
Starting position: Two projects are already past Gate 0 — the Cleveland PdM PoV (PRJ-03) and Tilden Concentrator Optimization (PRJ-10) launched on Chad Asgaard's direct authority before the formal program gate. Gate 0 for the broader program ratifies what is already working and launches the remaining H1 projects.
Investment: Phase 1 budget approved for remaining H1 initiatives at 2-3 entry sites.
Deliverable: Detailed project charters with measurable 8-week outcomes (per Chad Asgaard's directive). Site leadership accountable for specific KPIs. Early status from PRJ-03 and PRJ-10 presented as evidence of execution capability.
Decision: GO — commit to the 6-month prove-it phase, building on the momentum of the two projects already in motion.
Gate 1 — Proof of Value (Month 6)¶
Evidence required: - Phase 1 results at entry sites — quick-win KPIs measured against charter targets - Data connections operational (CMMS-to-historian, ops-to-maintenance linkage) - Champions engaged and adoption underway at entry sites - First financial returns quantified
Decision options: - CONTINUE — expand to Phase 2 (add sites, add Horizon 2 projects) - PAUSE — fix adoption or integration issues before scaling - REDIRECT — shift resources to projects showing stronger returns
Off-ramp: At this point, CLF has invested in H1 delivery at 2-3 sites. If returns are below expectations, the data connections and knowledge bases created still have lasting value. The program pauses without sunk-cost pressure.
Gate 2 — Foundation in Place (Month 12)¶
Evidence required: - Two or more sites operational per workstream - ROI confirmed at entry sites with quantified financial impact - Data platform maturing — cross-site data flowing - Second-site deployment proving the "prove-then-scale" model works
Decision options: - CONTINUE — proceed to Phase 3 optimization and prediction - CONSOLIDATE — hold at current scope, maximize value from deployed projects - EXPAND — add workstreams that were initially deferred
Off-ramp: Cumulative returns measured against cumulative investment. The self-funding test: are H1 savings covering H2 investment?
Horizon 3 — Program Maturity (Months 13-24)¶
After Gate 2, the program runs on established momentum. H3 is resource-intensive but builds entirely on proven foundation, existing data pipes, and established team relationships. The investments are larger per phase but lower-risk because the architecture, champions, and operating patterns are proven. No formal corporate gate is required — H3 proceeds on the authority earned by H1 and H2 results. Corporate retains the option to adjust pace or scope at any time.
The Self-Funding Cascade¶
The program is designed to generate returns early enough to fund its own expansion:
| Period | What Happens | Gate Decision |
|---|---|---|
| Months 0-6 | Corporate funds the prove-it phase (H1 projects at entry sites). | -- |
| Months 3-6 | Quick-win value starts materializing: inventory reduction frees working capital; procurement automation reduces cycle cost; knowledge capture reduces training and ramp-up time; ops-maintenance integration reduces misattributed delays. | -- |
| Months 6-12 | Phase 1 returns begin to offset Phase 2 investment. Demonstrated ROI justifies continued funding. Site GMs see value and buy in for expansion. | Gate 1: Corporate evaluates and decides on Phase 2. |
| Months 12-24 | Self-sustaining program. Cumulative returns exceed cumulative investment. Data foundation built by H1/H2 projects enables H3 at lower marginal cost. | Gate 2/3: Corporate decides on optimization-stage investment. |
This structure directly addresses the budget reality observed at every site: there is no standing discretionary AI budget. The first phase must be funded as an investment, but by Month 6-12, the program's own returns should justify continuation. Each milestone gate is a decision point, not a contractual obligation. By the time H3 begins, the program has earned its own momentum.
\pagebreak
Prove-Then-Scale: The Cross-Site Model¶
Every project follows the same deployment pattern across the CLF footprint:
| Phase | What Happens |
|---|---|
| Prove | Deploy MVP at entry site. Measure against charter KPIs. Build implementation playbook. |
| Stabilize | Refine based on real-world use. Confirm ROI. Document lessons. Prepare playbook for second site. |
| Scale | Deploy at second site using adapted playbook. Local champion leads adoption. |
| Expand | Roll out across remaining sites. Each deployment is faster than the last — playbook refined, data patterns understood, integration patterns proven. |
Entry site selection is evidence-driven: strongest champion, most accessible data, most favorable implementation environment. Each project chapter (Chapter 5) specifies the recommended entry site and rollout sequence with rationale.
What This Is Not¶
- Not a technology deployment. The roadmap does not prescribe specific AI tools, ML frameworks, or vendor products. It prescribes information flow solutions that happen to use AI as the mechanism.
- Not a data lake program. There is no "Phase 0: build the data platform." The data platform emerges from the project work.
- Not a two-year commitment. It is a six-month commitment with a gate to evaluate and decide, repeated four times.
- Not a consulting engagement that never ends. Gate 3 explicitly targets internalization — CLF running the program with its own people and the tools built during the program.