- Unsupervised Learning
- Posts
- Examples of Bad Metrics
Examples of Bad Metrics
There are myriad books and websites describing the Top N Metrics in a particular area, but very few tell you what not to do. This article will show you specific examples of metrics gone bad, what’s wrong with them, and what you can do to make them better.
Economic as in economics, not financial.
First, it’s important to understand the philosophical and economic downside of metrics systems. The book, The Tyranny of Metrics, by Jerry Z. Muller does a great job of capturing the issue.
The book talks about a couple of thinkers who called out the dangers of over-indexing on metrics with two related laws. First is Campbell’s Law:
The more a quantitative metric is visible and used to make important decisions, the more it will be gamed—which will distort and corrupt the exact processes it was meant to monitor.
An adaption of Campbell’s Law
And the second is a similar, more simplified version of the same, by Goodhart:
Anything that be measured and rewarded will be gamed.
An adaption of Goodhart’s Law
Basically, the more visible, quantifiable, and important a metric is, the more it’s vulnerable to gaming and toxicity to its initial purpose.
But neither the author of this book, nor I, are saying to avoid metrics, or that they’re inherently harmful. We’re simply saying that you need to avoid metrics worship, or what Muller calls, “Metrics Fixation”.
As with so many important things in life, the key is balance—in this case between measurement and judgment. We can and should use metrics where appropriate, but we can’t allow them to turn into a religion.
Examples of real-world metrics gone bad
Economics is critical because it’s about understanding how policy changes have both desired effects and externalities.
The best way to illustrate this problem is to give examples of bad metrics that produced unwanted outcomes. Here are some of the most cringe-smile invoking examples.
Number of venomous snakes
What we’re trying to avoid is Metrics Fixation.
A leader in India said too many people were dying from venomous snakes, so he offered money to anyone who brought him a dead one.
Unintended Negative Result: People started breeding venomous snakes in private, so they could kill them and bring them to the government.
A Better Metric: Reward people for a fewer number of deaths being reported from venomous snakes. But realize that this can—and likely will—cause additional effects (like people being paid to classify snakebite deaths as something else).
Stop taking hard cases
Surgeons are often judged by how often there are complications or deaths in their surgeries, which affects their marketability and insurance rates.
Unintended Negative Result: Many surgeons stop taking high-risk or complicated cases, which results in people who really need help getting inferior care.
A Better Metric: Incorporate a rating of difficulty, risk, or complication in the calculation, and maybe even incentivize the courage to take on hard cases.
Teaching to the test
Governments in the last couple of decades have focused on making sure more students can hit a minimum level of competency in subjects such as English and Math.
Unsupervised Learning — Security, Tech, and AI in 10 minutes…
Get a weekly breakdown of what's happening in security and tech—and why it matters.
Unintended Negative Result: Many schools have taken this to an extreme, and basically spend all their classroom time teaching to the test, which results in no freedom, enthusiasm, and ultimately a loss of curiosity and creativity in the students.
A Better Metric: Find ways to encourage creativity and curiosity, as well as rote learning, since those are a big part of what we’re trying to foster in our children as a springboard for life-long learning.
Additional metrics that directly violated their purpose
Chinese peasants used to be paid for finding dinosaur bones, but this actually lead to them breaking every bone they found into multiple pieces so they could be paid multiple times.
Security managers prioritizing bug volume rather than bug quality, leading to more bugs that don’t matter and them spending less time on the ones that matter.
Salespeople being rewarded based on number of leads, which often creates tons of poor, unqualified leads that take up quality time that should have been spent elsewhere.
Wells Fargo massively incentivized the metric of “new accounts”, which caused them to set up thousands of fake accounts, ultimately resulting in major lawsuits and financial impact.
Hospitals getting penalized for readmissions would treat returning people as outpatient instead of inpatient.
Police departments labeling worse crimes as misdemeanors to show progress.
A number of governments with air pollution problems have started alternating which cars can be on the roads each day by even and odd license plate numbers, which unfortunately led many to buy an additional vehicle so they could drive every day.
Manufacturing workers being told “reported incidents WILL go down”, which doens’t mean necessarily that less people will get hurt, but that if they do get hurt they should find a way to avoid reporting it.
Glass plant workers were told to produce as many square feet of sheet glass as possible, and soon started making it so thin that it wasn’t usable for anything.
The mortgage loan situation is great example of a good-natured metric causing great harm.
Before the mortgage crisis of 2008, many banks were given metrics for loans to non-traditional borrowers (people who couldn’t normally qualify for a mortgage), and the result was billions in loans that couldn’t be paid back.
Discussion
There’s a similar example in the problem of teaching AI what to value if it becomes sentient and super-intelligent. You can’t be too specific or you could cause great harm.
For me the key here is that metrics should 1) tell us the state of the world we care about, and 2) track spiritually to what we desire as opposed to technically. Great examples of that were seen above, where we thought this quantatative, visible metric got us what we wanted, when in fact we wanted something broader and more difficult to describe.
Another example of this comes from Daniel Kahneman’s research on happiness, where he argues that it’s not happiness people are looking for, but rather satisfaction that their lives are going well in the long-term.
If we were to measure smiles, for example, as a proxy for happiness, how would that track with satisfaction? It’s those disconnects between the measured and that which we truly care about that are crucial to avoid in any metrics program.
Many argue that metrics have eaten standardized education.
The other thing to avoid is having the entire measurement effort metastasize into a tumor that eats the organization. Possible solutions there could include hard limits on the number of metrics, the time allowed to be spent on them, and/or the budget for running the program. As Muller points out, organizations with high turnover and a lack of direction could confuse metrics with leadership and do little else.
Summary
Metrics are a good way to align a team around a goal.
The more numeric, visible, and reward-tied a metric is, the more likely it is to be gamed and turn toxic to its original purpose.
When you use a metrics program, be sure to periodically ensure that undesired externalities have not emerged as a result. And be prepared to go digging, since the negative effects could be well-hidden.
Moderation is key. Use metrics, but don’t let them control you or become a substitute for judgment.
Remember to constantly revisit the spirit of what you’re trying to attain, and continuously ask yourself whether the tangible things you’re tracking are high-signal proxies for those goals.