Why metrics deteriorate — and how they take society with them
There is a strong correlation between how fast someone can run and how good they are at soccer, for obvious reasons. Speed is useful on the field, and in the general population it also correlates closely with other athletic attributes. If you take a totally random sampling from the general population, faster sprinters will also have better coordination, better reflexes and better endurance, among other advantages.
If you had to very quickly pick soccer teams from a large group of hopefuls, you could do far worse than racing them from one end of the pitch to the other and sorting them in the order they finish. If you do this with a random sampling of 100 people from general population, the first 11 will be a far better team than the second 11, who will in turn be far better than the next 11. This shortcut selection will certainly yield results far faster than the process of evaluating 100 people’s ball skills and removes the numerous elements of subjectivity and bias involved in selection.
In other words, sprinting speed is a useful metric for team selection at that level. My old primary school, in good public school tradition, had a soccer coach who was actually just a teacher with a bit of free time and no knowledge of soccer whatsoever. When he used this metric his teams’ results improved, at least initially.
It went predictably wrong, however, when players started protecting their spot on the team by putting more energy into sprinting training than soccer practice, eventually leaving the school with a decent batch of sprinters who lost soccer matches by prolific margins.
This is reminiscent of the Observer Effect in physics, which states that the act of observing a phenomenon changes that phenomenon, but the effect is far stronger when you go so far as to attach an incentive. The principle can be reworded for sociological purposes: Metrics based upon a correlation affect that correlation. In most instances, the correlation is weakened, with the net result that metrics tend to deteriorate over time*.
Society is replete with examples of this phenomenon, but I will focus on one which has come into stark focus in the wake of the COVID-19 pandemic: the practice of science.
For many generations, it has been accepted as obvious that scientists, especially competent scientists, are broadly beneficial to society in a multitude of ways both tangible and intangible. Understandably, governments all over the world allocate significant budgets to supporting the sciences, ostensibly for the wide-ranging benefit of society at large. The great difficulty of this lies in all those intangible elements; a scientist’s overall value to society is impossible to measure objectively. And so, metrics are used as a stand-in to evaluate scientific output in place of all of the immeasurable and unpredictable facets of science.
Two metrics are prized above all others; a scientist’s number of peer-reviewed publications, and the number of citations garnered by those publications. These metrics are based on the long-ago observation that the very best scientists published a good deal of work, and the most impactful scientific work was frequently cited by other scientists.
Sprinting speed, at least, has considerable value on the soccer pitch, whereas publications and citations do not carry any intrinsic value to society. These metrics, therefore, are quite clearly worse-formulated than that erstwhile soccer coach’s selection criterion.
With global population well over seven billion, humankind has very drastically exceeded the natural eco-system’s capacity to sustain us, and we are quite rapidly consuming finite resources while population growth and demand grow unabated. We are now dependent on science not just for the advancement of our society but for its very survival. We ought to be doing a better job of directing its future than a part-time coach who doesn’t understand soccer.
The metrics by which we measure science are transparently poor, and bound to deteriorate on a continuous basis. Consider some of the behaviours that they incentivize:
For one, innovation is not rewarded. Genuinely new ideas are difficult to publish and take time to develop. A far more reliable success strategy is to stick to well-established fields. And the more people there are working on the same topic, the better; that guarantees citations for everyone.
For another, the most successful researchers are not really researchers, but managers. Twenty solid grad students can produce more output than even the most inspired scientist and so, under this incentive scheme, a researcher’s time is better spent on fund-raising and administration than on actually doing research. This phenomenon tends to remove the best scientists from actual science, reducing quality of work, but its effects run far deeper. We depend upon top-rated scientists, particularly in times of crisis, for clear thinking and quality decision-making, both of which have been quite remarkably lacking in the face of the COVID-19 crisis.
The assumption that top scientists are able to advise competently on pressing issues is rooted in the assumption that they are innovative thinkers with minds kept sharp by the intellectual challenges of cutting-edge research. Perhaps this was true when the system was conceived, but with the metrics and incentive structure in place the ongoing trend must be toward a state of affairs where the top-rated scientists are those who have most successfully exploited a set of perverse incentives. The logical expectation is those who are likely to rise to the top are those who fit a certain profile, dictated by the incentives. Absent from that profile is the kind of creativity and innovative thinking that we still expect and, sometimes, desperately need from science. Also absent is a strong sense of responsibility to create value for society, because the ways in which scientists genuinely create value for society are poorly measured by this set of metrics, and poorly rewarded by the associated incentives.
The value of scientists to society is actively eroded by the metrics we use to measure it. The net result is an academic system that is more and more expensive, but less and less valuable. At its extremes it becomes an effete form of state capture, where tax dollars intended to benefit society are cynically extracted through exploitation of a set of rules that are long overdue for drastic revision.
There is another dimension to the widespread use of metrics, which is that the actual mechanisms of their implementation distort the behaviours they’re measuring. If job performance is to be measured according to metrics, then what is rewarded is not just skewed toward those metrics but also toward compliance with reporting procedures. This diverts energy away from productivity and toward paperwork and it also favours certain personality profiles. Some people become highly frustrated with administrative tasks that are unrelated to the fundamental goals of what they’re doing, while others thrive on them. A metric-driven society will naturally favour and elevate the latter, not because of better performance but because of compliance and better reporting.
This phenomenon disproportionately affects fields such as art and science, where free and creative thinking are crucial, but it applies across all fields and industries. Our society is increasingly focused on replacing subjective judgment with measurable metrics, for any number of reasons, many of them noble and well-intentioned, but with the unintended side-effect of creating systems that are increasingly prone to exploitation, and which deteriorate over time.
* There are instances where correlations become stronger once metrics are applied, which can result in a beneficial feedback loop with positive results, or which can more deeply entrench a pre-existing disparity.