Why “scientific data” is hurting and helping

A few weeks ago, I got done reading the landmark baseball book Moneyball.  It is an outstanding book that overlaps a bit with the Brad Pitt film, but goes into far more historic perspective on the birth of sabremetrics.

Trying my best to briefly recap:  sabremetrics (or advanced metrics) grew out of a major series of papers written by Bill James back in the early 1980s.  He started with some of the basic “unanswerable” questions of baseball (Is it pitching or hitting that wins games?  What role does defense really play?), and he tried to answer them much like a scientist might.  He gathered as much raw data (player and team statistics) as he could, and then looked for correlations between those numbers and team success.  His findings were revolutionary:  the currency of baseball is the out, and therefore, you want players that make the fewest possible outs … this means that you want players with high on-base percentages (OBP), and ignore the big stats like batting average and RBIs.  He also found a strong correlation for success with players who had high slugging percentages (SLG — the measure of a player’s “power” … home tuns, and triples increase your SLG more than doubles and singles).

The point driven home in Moneyball is that when baseball teams were presented with this wealth of tremendous data showing them how to win, they all largely ignored it until Billy Beane and Paul DePodesta needed a way to build a winning team in Oakland without a New York Yankees gangster wad in their back pockets.  They realized that the qualities that won games (getting walks to improve your OBP and hitting doubles to raise your SLG) were often under-priced in the vast marketplace of players, and that Oakland could build a winning team by going after these players.  You didn’t care so much about defense, which was difficult to quantify, and you definitely didn’t want your players trying to steal bases, unless their chance of success was extremely high.  You wanted guys who could draw walks and not hit the ball into the air to make a lot of outs.

The A’s have built teams that on a performance-to-dollar basis outperform nearly every team in Major League Baseball, but have failed to get a team to the World Series in the almost 2 decades they have been trying.  From a front office standpoint, the team has prospered.  From a fan’s standpoint, the A’s have largely failed to make a run at the World Series, and play in a (sometimes literally) cesspool of a stadium.  Sure, you were lucky in the men’s room at Wrigley Field if the only things you had to deal with was a pantsless mascot rubbing against you while the odor of urine wafted above the single trough running through the middle of the room … but at least you usually never had to actually walk through human waste.  The good news is that the A’s are finally getting a new stadium (bad news Cub fans … you still have Clark the pantsless bear as your spokesperson).

Moneyball, the book, was published in 2003, and over the past almost 15 years, baseball front office people have digested it, and have adjusted to it.  This summer, watching baseball, I have come to the conclusion that this philosophy has reached saturation in baseball (looking for proof?  Check out the most recent World Series winning team … a team of very interchangeable players, not really any stars or Hall of Fame caliber players, but which did come together to score runs and win games). However, I’m not sure this is overall a great thing.  Out are things like base running, great defense, and with players trained to take more and more pitches to get that walk … is it any wonder that the average time of Major League games has become a problem?

This new approach to building teams has added a degree of parity to baseball which, with the absence of a salary cap, isn’t at all a bad thing (when the Yankees haven’t won a World Series in 8 years, and the Royals have … that is a GOOD thing).  However, in so carefully building a data-driven baseball team, I think there is a certain fun factor that is sucked out of the game.  This may not on the surface seem like a big deal (after all … if you are winning, aren’t you having fun?), but there is something underlying the patterns of data … since ultimately, only a handful of teams make the playoffs, for the teams that are trying to do this, and are not winning, what you are left with comes across as boring … and my guess is that this does not sell seats or merchandise … and that should concern the ownership and management of teams to some extent.


I am sensitive to this not because I am a fan of a team that is rebuilding and currently occupying the American League cellar for the first time in decades, but because I am a teacher.  About 15 years ago, data-driven instruction became the way every school was going to be run.  What have been the results?  I would argue that it has raised some test scores (whatever that means), and has largely turned kids off from learning, especially more challenging material.  In February 2016, there were over 5 million job openings in the United States, and roughly 8 million or so unemployed.  There are lots of reasons for this, and I will not oversimplify this … but when jobs in the technology sector, engineering, legal, education, and medicine are not being filled, one has to wonder why there aren’t enough qualified people to do those jobs while we still have so many people unemployed? Again, I won’t pretend that I have a magic bullet here, but I think part of this problem lies in the fact that we have turned off a generation of kids to learning because it has become so dry and uninteresting thanks to data driven instruction sucking the creativity and life out of a very human enterprise like education.  I’m not saying “get rid of all data”.  That would be as bad an overreaction as what got us here in the first place … but I have felt the pain of teachers and students being pushed by wannabe systems’ analysts with DEds who think they actually know good data and know how to analyze it.

I think baseball needs to remember that above all sports, it is a sport haunted by numbers.  Numbers like 755 and 262 aren’t mere numbers … they are descriptors and milestones of greatness.  Baseball has always been a numbers game, and advanced metrics add some really cool and significant points of analysis to what is going on.  However, like so many human endeavors, there are often non-measurable factors that are part of what happens, and failure to acknowledge them comes with a great cost.

To be clear … it is not the scientific approach to things that is bad … it is not understanding the limits of the data and analysis that are bad.  This is true for people applying statistical analsysis to baseball and education just as it was bad for people applying the misshapen and not well-understood ideas of Darwinian evolution to societies and individual people.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: