Cost-Benefit Methodology¶
Why We Take This Approach¶
Most consulting engagements estimate value from the top down: "steel companies typically save X% with AI." We do the opposite. Every number in this document traces to a specific operational metric at a specific Cleveland-Cliffs site, grounded in conversations with the people who live with that metric every day.
We spent four weeks across four facilities — Cleveland Works, Middletown Works, Tilden Mine, and Burns Harbor — conducting over 120 practitioner conversations. The value estimates that follow are built from what your operators, maintenance leaders, and process engineers told us about what costs money, how much, and how often. Where we use industry benchmarks at all, it is only to sanity-check what your own people reported.
This chapter describes the methodology. Chapter 7 applies it to the full portfolio.
How We Estimate Value¶
Six Value Categories¶
Each recommended workstream generates value through one or more of these mechanisms:
| Category | Definition | How We Measure | Example |
|---|---|---|---|
| Throughput gain | More production from existing assets | Tons/year x margin/ton | Reducing unplanned downtime frees additional heats per day at the constrained asset |
| Cost avoidance | Preventing failures and reducing waste | Event cost x frequency reduction | Cobble cost per event x cobble frequency x prevention rate |
| Direct cost reduction | Spending less on inputs | Current spend x % reduction | $50M annual reagent spend x 5% optimization = $2.5M |
| Efficiency gain | Same output with less effort | Hours saved x loaded labor cost | 2 hours/day of manual logistics planning x 365 days x labor rate |
| Risk mitigation | Reducing probability or impact of costly events | Expected loss x probability reduction | Environmental compliance exposure, safety incident cost, knowledge-loss impact |
| Inventory optimization | Reducing carrying costs | Inventory value x carrying cost % x reduction % | $104M parts inventory x 25% carrying cost x 10% reduction |
Most workstreams touch two or three categories. We account for each separately and sum them — we do not double-count.
The Five-Step Estimation Process¶
For each workstream at each site, we follow the same disciplined process:
Step 1 — Identify the operational metric. What did your people tell us costs money? We trace every estimate back to specific conversations documented in the site initiative registries, with direct quotes and context.
Step 2 — Quantify the current state. What is the metric today? For example: 70% reactive maintenance mix, $50M in annual reagent spend, or 40 coil-movement trips per day. Each data point is tagged by source — whether it was confirmed by a stakeholder, estimated from available data, or benchmarked from comparable operations.
Step 3 — Define a realistic target state. Not a theoretical optimum — an achievable improvement given your systems, your people, and your operating constraints. We benchmark targets against CLF's own best-performing sites where possible, and against industry comparables where necessary. We always present a conservative target alongside a stretch target.
Step 4 — Calculate the delta. The gap between current state and target state, multiplied by unit economics, gives us addressable value. We then apply a capture rate — because no initiative captures 100% of addressable value. Our capture rates are:
- 30-50% for early-phase quick wins (Horizon 1)
- 50-70% for established initiatives with proven data (Horizon 2)
- 70-90% for mature, fully integrated programs (Horizon 3)
Step 5 — Sensitivity check. What if the base metric is off by 20%? What if the improvement is half of what we estimate? We run these scenarios and present every estimate as a range: conservative, expected, and optimistic.
What We Do Not Do¶
- No industry hand-waving. "Industry reports say steel companies save $X with AI" is not evidence. Our numbers come from CLF's own operations.
- No T-shirt sizing. "Small, medium, large" is undefendable in a business case. Every estimate has math behind it.
- No single-point estimates. Always ranges. Always stated assumptions. Always decomposed into components you can verify independently.
How We Estimate Cost¶
Phase-Type Template Methodology¶
Cost estimates follow the same transparency principle as value estimates. Rather than building each estimate from scratch (which introduces subjective drift across 30 project-phases), we classify each phase into a structural archetype and derive team composition from that classification. The archetypes are anchored to two charter actuals — projects where the scope, team, and cost have been worked through in detail.
The two anchors: - PRJ-10 Tilden Concentrator Phase 1: $312,000 (3.25 FTE, 8 weeks, firm charter) - PRJ-03 Cleveland PdM Phase 1: $236,000 (2.4 FTE, 8 weeks, firm charter) - PRJ-10 Tilden Concentrator Phase 2: $1.3-1.56M (9.5-11 FTE, 12 weeks, indicative charter)
Five Phase Types¶
Every project-phase is classified into one of five structural archetypes based on observable properties — number of sites, data sources, AI complexity, deployment depth, and organizational change required.
| Type | What It Is | Team Shape | Duration | Cost Range |
|---|---|---|---|---|
| A: Proof of Value | Bounded experiment, static data, prove the thesis | 2.5-4 FTE, lean | 7-10 weeks | $200-350K |
| B: Operational Pilot | Working systems, may span 2 sites, more complex ML | 4-7 FTE | 10-14 weeks | $400-750K |
| C: Full System Build | Live integration, plant-floor, operator-facing | 7-11 FTE + IE domain | 12-16 weeks | $1.0-1.6M |
| D: Site Extension | Proven system adapted to new site | 3-5 FTE | 8-12 weeks | $250-500K |
| E: Cross-Site Platform | Enterprise analytics, corporate dashboards | 4-7 FTE | 10-14 weeks | $400-800K |
Classification is based on structural properties, not subjective judgment. For example, a phase with physics-informed ML, plant-floor deployment, and operator training is a Type C regardless of which project it belongs to. Complexity modifiers (dual-site parallel, CV components, legacy system integration) adjust within the type range.
Cost Components Within Each Phase¶
| Component | What It Includes | How It Maps |
|---|---|---|
| AI/ML development | Model development, training, validation, prompt engineering | "ai" topic features in the feature audit trail |
| Data engineering | Pipelines, ETL, data quality, integration | "dev python" and "bi" topic features |
| Application & UX | Dashboards, operator interfaces, workflow integration | "dev" topic features |
| Infrastructure | Cloud compute, pipeline orchestration, model serving | "devops" topic features |
| IE domain roles | Metallurgists, process engineers, change management | Non-feature roles added in team sizing |
| Project management | Coordination, stakeholder alignment, gate management | 0.25-0.50 FTE per phase |
| Contingency | 20% buffer on hours | Applied uniformly |
Blended rate: $250/hr across all resources (Vooban + IE). Charter actuals used differentiated rates ($175-$300); the blended rate is calibrated to reproduce charter totals within acceptable variance (±15%).
Phased Cost Structure¶
Every project is costed per phase, not as a lump sum. This matches the milestone-gate structure described in Chapter 4:
- Phase 1 — Prove it at a single site. Smallest investment, highest learning. Typically Type A or B.
- Phase 2 — Build and scale. Full system build at entry site, often with parallel scope at a second site. Typically Type C.
- Phase 3 — Platform and expand. Cross-site deployment, corporate dashboards. Typically Type D or E.
- Annual run rate — 15-20% of build cost per year for monitoring, model maintenance, and support.
This structure means Cleveland-Cliffs is never committing to a large program upfront. Each phase has a defined budget, a defined deliverable, and a go/no-go gate before the next phase begins. The total program ($22.6M over 24 months) flows through gates — the maximum at-risk amount at any point is the current phase investment.
Auditability¶
Every cost figure traces through a four-step chain:
- Project description (ch5)
- Feature decomposition (features.tsv)
- Phase classification
- Team + cost (team.md)
Features are the audit trail — they document what gets delivered in each phase and justify the classification. The classification drives the team template. The team template produces the cost. If any number is challenged, the chain shows exactly where each assumption enters.
The Self-Funding Logic¶
A critical design principle of this roadmap is that early phases generate measurable returns that offset investment in later phases. The program is structured so that corporate funds the initial prove-it phase, and demonstrated value builds the case for continued investment.
| Period | Activity | Milestone Gate |
|---|---|---|
| Months 0-6 | Phase 1 investment: corporate funds the prove-it phase across selected workstreams. | -- |
| Months 3-6 | Quick-win value materializes: inventory reduction frees working capital; procurement automation reduces cycle cost; knowledge capture reduces training time and risk. | -- |
| Months 6-12 | Phase 1 returns offset Phase 2 investment. Demonstrated ROI justifies continued funding. Site leadership sees tangible value and buys in for expansion. | Corporate decides to continue, pause, or redirect. |
| Months 12-24 | Self-sustaining program. Cumulative returns exceed cumulative investment. Data foundation built by early projects enables advanced analytics at lower marginal cost. | Corporate decides on full-scale investment. |
This is not theoretical. The workstream sequencing in Chapter 5 is deliberately ordered so that high-confidence, shorter-payback initiatives come first. They prove the approach, generate savings, and create the data infrastructure that makes subsequent phases faster and cheaper to deliver.
Confidence and Sensitivity¶
How We Express Confidence¶
Every value estimate in Chapter 7 carries a confidence indicator:
| Confidence Level | What It Means | Typical Basis |
|---|---|---|
| High | Base metric confirmed by multiple stakeholders; improvement rate benchmarked against CLF's own performance | Stakeholder-confirmed numbers, internal CLF data |
| Medium | Base metric confirmed by at least one stakeholder; improvement rate based on comparable operations | Single-source confirmation, analogous site data |
| Low | Base metric estimated from partial data; improvement rate based on industry benchmarks | Limited site data, external references |
We default to the conservative end of every range. Where confidence is low, we flag it explicitly and explain what additional data would sharpen the estimate.
Sensitivity Analysis¶
For the portfolio as a whole and for each workstream individually, we test three scenarios:
- What if our value estimates are 30% too high? Does the program still pay back? At what timeline?
- What if costs run 25% over estimate? Where does that push the break-even point?
- What if we capture only half the improvement we project? Which workstreams still stand on their own?
These scenarios are presented in Chapter 7. The purpose is straightforward: you should be able to see that even under pessimistic assumptions, the recommended program represents a sound investment — and if it does not, you should see exactly where the economics become marginal.
Assumptions We Are Making¶
For full transparency, here are the standing assumptions across all estimates:
- Labor rates use loaded costs (salary + benefits + overhead), not base salary
- Throughput gains assume the market can absorb additional production (i.e., demand is not the constraint)
- Capture rates assume adequate change management and champion engagement at each site
- Infrastructure costs assume CLF proceeds with Databricks as the enterprise data platform; estimates would increase if each workstream must build its own data infrastructure independently
- Timeline estimates assume dedicated CLF resources are assigned to each active workstream as outlined in the resource plan
Any of these assumptions can be adjusted. The decomposed structure of our estimates means you can change a single input and see the effect ripple through to the bottom line.