Why the metrics you trust might be measuring the wrong thing — and how to tell
Every operational leader has a dashboard.
A set of metrics that they review regularly, that they hold their teams accountable against, that they use to form a picture of how the business is performing. Schedule adherence. OEE. OTIF. Scrap rate. Forecast accuracy. Inventory turns.
The dashboard is trusted. It has been built carefully, reviewed regularly, and refined over time. When a metric moves, leadership responds. When a metric improves, it is taken as evidence that the operation is improving.
Sometimes that is exactly what is happening.
Sometimes the operation hasn’t changed at all.
The metric has changed.
This post is about measurement system failure — not in the technical sense, not in the gauge repeatability and reproducibility sense, but in the operational sense. The way that the metrics that are supposed to describe a business’s performance begin, gradually and imperceptibly, to describe something else entirely.
And the consequences that follow when leadership mistakes one for the other.
1. How Metrics Degrade
Metrics don’t fail suddenly. They degrade gradually, through predictable mechanisms.
Definition drift. Every metric has a definition. What is included. What is excluded. What counts as conforming. What timeframe is used. Over time, those definitions shift — through informal agreement, through system configuration changes, through the accumulated weight of individual decisions about how to handle edge cases.
Each individual decision seems reasonable. Together, they produce a metric that measures something meaningfully different from what it measured when it was designed.
Schedule adherence that was defined as orders shipped on the customer-requested date begins to be measured against a revised date — one that reflects the latest confirmed delivery date after the original was missed. The metric improves because the denominator has been quietly changed. The customer experience hasn’t.
Gaming. When performance is measured, people manage to the measure. When the measure is imperfectly designed, managing to the measure produces behaviour that is good for the metric and bad for the operation.
A scrap metric that measures scrapped units as a percentage of units produced creates an incentive to overproduce. More units produced means a lower scrap percentage for the same absolute scrap volume. The metric improves. The waste doesn’t.
An OEE metric that excludes planned downtime creates an incentive to reclassify unplanned downtime as planned. The OEE improves. The asset reliability doesn’t.
These behaviours are not dishonest in a simple sense. They are rational responses to a measurement system that creates the wrong incentives. The fix is not better discipline. It is better measurement design.
Scope creep. Metrics are designed for a specific scope. Over time, the scope changes — products are added, processes change, the definition of the value stream evolves — but the metric doesn’t change with it. The thing being measured and the thing the business cares about have diverged.
Customer OTIF measured at despatch rather than at receipt. The logistics time between despatch and delivery is outside operations’ control, so it was excluded from the measure. The customer’s experience of on-time delivery includes it. The metric measures operational compliance. The customer is measuring something else.
2. The Signal That Is Lost
Every degraded metric involves the same underlying failure: a signal that the business needs is being replaced by noise that the business is mistaking for a signal.
The schedule adherence that looks good but is being measured against revised dates is not telling the business that delivery performance is strong. It is hiding the information that customer commitments are being broken — information that should be triggering commercial conversations, root cause analysis, and operational redesign.
The OEE figure that excludes certain categories of downtime is not telling the business that asset reliability is improving. It is hiding the true reliability picture — information that should be informing maintenance strategy, capital investment decisions, and production scheduling.
The scrap rate that is deflated by overproduction is not telling the business that quality is stable. It is hiding the absolute volume of waste being generated — information that should be driving process capability improvement and standard work discipline.
In each case, the degraded metric is providing false reassurance. It is allowing the business to believe that performance is acceptable when it isn’t. It is suppressing the discomfort that should be driving improvement.
And in most businesses, this is happening across multiple metrics simultaneously. The dashboard that looks healthy is a collection of individually degraded signals — each one plausible, none of them quite right.
3. The Audit That Nobody Does
There is a specific quality assurance activity that almost no business performs regularly.
The measurement system audit.
Not in the metrology sense — not the calibration of instruments and the assessment of gauge error. In the operational sense: a structured review of whether the metrics in use are measuring what they are supposed to measure, with the definitions they were designed with, in the scope they were intended for.
This audit asks: has the definition of this metric changed since it was designed? Has the scope it covers changed? Is there any systematic reason why the people being measured might be managing their behaviour to influence the metric rather than the underlying performance?
And it asks the hardest question: is there a gap between what this metric reports and what someone walking the process would observe?
That last question is the most diagnostic. If the metric says schedule adherence is 94% but an independent observer, watching the process for a week, would estimate it at 80% — the metric has failed.
The measurement system audit is uncomfortable because it finds things that reflect poorly on how the measurement system was designed and maintained. It implicates the people who designed it and the people who maintained it. It requires the willingness to find out that the numbers that have been trusted, reported, and celebrated are not as reliable as they appeared.
That discomfort is exactly why it is valuable. And exactly why it rarely happens.
4. The Perverse Behaviours Inventory
In every business, there are behaviours that exist specifically to produce a good metric outcome rather than a good operational outcome.
They are worth cataloguing explicitly, because once visible they are usually addressable — and until visible they are invisible precisely because they are successful. The metric looks good, so nobody asks what is producing the good number.
Common examples.
Closing jobs early. Manufacturing orders are closed in the system before the product is actually complete, in order to record on-time completion. Work that remains is carried forward under a new order or a rework code. Schedule adherence improves. Actual completion timing doesn’t.
Inventory month-end management. Stock is moved between locations, reclassified, or temporarily excluded from count in the days before month-end, in order to report an inventory position that meets the target. The underlying inventory position hasn’t changed. The metric has.
Premium freight miscoding. Express freight that was driven by an operational failure is coded to a project code, a customer request code, or a commercial decision — anything that removes it from the premium freight line in the operational P&L. The cost is real. The metric is clean.
Defect reclassification. Defective units that would be scrapped are reclassified as concession product, sold at reduced price, or held as rework. The scrap metric improves. The quality problem remains.
Forecast accuracy window adjustment. The window over which forecast accuracy is measured is adjusted to the period where accuracy is naturally highest. Longer-horizon inaccuracy, which reflects the quality of the planning process, is excluded.
None of these are unique. Every manufacturing business has some version of them. The question is whether leadership knows they are happening — and whether the measurement system is designed to make them visible or to make them easy.
5. Designing Metrics That Are Hard to Game
The principle is straightforward: metrics that measure outcomes rather than activities are harder to game than metrics that measure activities alone.
Customer OTIF measured at the point of customer receipt is harder to game than OTIF measured at despatch. The logistics performance between the two points becomes visible and accountable.
Scrap measured as a percentage of material input rather than units produced is harder to inflate through overproduction. The absolute cost of waste is visible regardless of volume.
Schedule adherence measured against the original customer-requested date rather than the revised confirmed date reflects actual delivery performance rather than revised commitment performance.
Inventory turns measured on total inventory — including work-in-progress and slow-moving stock — rather than on finished goods alone reflects the true working capital efficiency of the business.
OEE calculated on the full available time — including planned stops that could be reduced — rather than only on planned production time reflects the full asset utilisation opportunity.
These are not more complex metrics. They are more honest ones. The complexity in the current metrics is in the exclusions and adjustments that have been added over time to make the numbers more comfortable.
Removing those adjustments makes the metrics less comfortable and more useful.
6. The Leadership Courage Dimension
The most important thing about measurement system integrity is that it requires leadership courage to maintain.
When the metric that has been reported to the board for three years is shown to be measuring something different from what the board believes it measures — that is a difficult conversation. It implies that performance was not as reported. It implies that improvement programmes were responding to the wrong signal. It implies that the operational picture that informed investment and resource decisions was not accurate.
Having that conversation requires the confidence that the honest picture — however uncomfortable — is more valuable than the comfortable one. And the belief that the board, and the organisation, would rather know.
Most organisations that have measurement system problems know they have them. The team leaders know the jobs are being closed early. The planning team knows the forecast accuracy window has been adjusted. The finance team knows where the premium freight is being coded.
They know. And they don’t say.
Because the culture doesn’t create space for saying it. Because the metric is attached to a performance target that is attached to a pay outcome that is attached to a leadership decision. Because the honest signal has too high a personal cost to surface.
That is a leadership problem. It is solved by leadership behaviour — by creating the explicit expectation that honest measurement is more valued than comfortable measurement, and by making that expectation credible through the response when honest measurement surfaces an uncomfortable truth.
The leader who receives a degraded metric honestly and responds by improving the measurement rather than punishing the messenger creates the conditions for honest measurement to persist.
The leader who responds to an honest signal by questioning the person who surfaced it ensures that the next signal stays hidden.
Final Thought
The numbers on your dashboard are not the performance of your business.
They are a representation of it — filtered through definitions, exclusions, adjustments, and the rational behaviour of people who are managing to a measure.
How close that representation is to the underlying reality depends on how well the measurement system was designed, how diligently it has been maintained, and whether the culture creates conditions for honest reporting when the numbers are uncomfortable.
Most operational leaders trust their metrics more than the metrics deserve.
Not because they are naive. Because the alternative — accepting that the picture they have built their operational judgement on might be substantially inaccurate — is uncomfortable enough that most people don’t go looking.
Three questions.
When did you last audit your core operational metrics — not their values, but their definitions, their scope, and whether they are measuring what they were designed to measure?
What behaviours in your organisation exist specifically to produce a good metric outcome rather than a good operational outcome — and do you know what they are?
And if an independent observer walked your operation for a week and built their own view of schedule adherence, quality, and productivity — how close would their numbers be to yours?
adam
Leave a Reply