Ok let's get down to the nitty gritty of how my Consistency Score stat works.
It's origins come from the Similarity Score, invented by Bill James and widely popularized in recent years by Sean Forman of Baseball-Reference.com.
Let's start by talking about how the Similarity Score itself works. After that, in the next post, I'll break down how the Consistency Score work as a variation of the Similarity Score.
For starters, check out Sean's explanation of James' method for the Similarity Score right here.
In brief, the Similarity Score looks at the career totals for a player (can be active or retired) and compares them to the totals for another player. For each stat considered, the difference between the two totals is worth a certain number of points, and those points are subtracted from a starting value of 1000. There is also consideration of position, which is why the players most similar to a given player are usually those who played the same position.
Note that James' method does attempt to call any stats "good" or "bad." For example, differences in walks and strikeouts alike are assigned points that get subtracted from the total.
You'll have to refer back to James' original book to understand the exact reasons why he chose the stats he did, as well as why he assigned the number of points he did.
So, in short, the Similarity Score allows one to assign players a single number that characterizes how similar their careers are. This is useful when comparing career achievements or, by looking at Sim Scores part way through careers, in projecting how an active player's career might continue.
Read Part II to see how the Consistency Score works.
Wednesday, September 16, 2009
Sunday, September 13, 2009
Motivation
The first and most obvious question is "Why?", as in why do I care how consistent a player has been?
There are three main reasons why I think this tool is valuable.
1. So often there are debates over the best player in baseball or best at a certain position. For example: who was the best offensive second baseman of the last 40 years--Joe Morgan, Ryne Sandberg, or Jeff Kent? I don't find this type of debate particularly useful. When viewed at least from purely a statistical standpoint, there isn't a team in baseball that wouldn't have been happy to have any of those guys playing for them. They were all very good players. I'm more interesting in knowing how reliable given players were. For example, Jeff Kent put up 377 homers in career while never hitting more than 37 in any one season, while Ryne Sandberg totaled only 282 homers, hit 40 in one season, and never hit more than 30 in any other season. My gut feeling is that Kent was a more consistent player, but I'd like to have a tool to measure that carefully.
2. I'm curious to be able to quantify how big of an abberation certain seasons were for certain players. The classic example is Brady Anderson's 1996 season when he hit 50 homers despite never hitting even half that in any other year of his career. Another example is the one I gave just above--was Ryne Sandberg's 40 HR season a major aberration or just a small one? At the time, could we have counted on Sanberg to reproduce that type of season? Same goes for Mark McGwire's 1991, when he batted just .201 and hit only 22 HR in 154 games. I want to quantify just how terrible that season was compared to his average year.
3. As I have already run a whole bunch of these numbers, I've seen a couple of trends emerge that I think will prove very useful. First is that players often take a few years to mature. When quantifying their first few seasons, there is very often a 'learning curve' where the total quantified performance builds up to a plateau that they typically reach in their 3rd or 4th season. Second is that there's a downward ramp that starts later in their career until finally their performance trickles off so much that they end up retiring, voluntarily or otherwise. I think that these trends can be used as a predictive tool by understanding typical career trends.
As you'll see over time, the numbers can be presented graphically for each player's career, making it easy to see trending that is upward or downward, as well as highly aberrant years. Much more to come on this.
There are three main reasons why I think this tool is valuable.
1. So often there are debates over the best player in baseball or best at a certain position. For example: who was the best offensive second baseman of the last 40 years--Joe Morgan, Ryne Sandberg, or Jeff Kent? I don't find this type of debate particularly useful. When viewed at least from purely a statistical standpoint, there isn't a team in baseball that wouldn't have been happy to have any of those guys playing for them. They were all very good players. I'm more interesting in knowing how reliable given players were. For example, Jeff Kent put up 377 homers in career while never hitting more than 37 in any one season, while Ryne Sandberg totaled only 282 homers, hit 40 in one season, and never hit more than 30 in any other season. My gut feeling is that Kent was a more consistent player, but I'd like to have a tool to measure that carefully.
2. I'm curious to be able to quantify how big of an abberation certain seasons were for certain players. The classic example is Brady Anderson's 1996 season when he hit 50 homers despite never hitting even half that in any other year of his career. Another example is the one I gave just above--was Ryne Sandberg's 40 HR season a major aberration or just a small one? At the time, could we have counted on Sanberg to reproduce that type of season? Same goes for Mark McGwire's 1991, when he batted just .201 and hit only 22 HR in 154 games. I want to quantify just how terrible that season was compared to his average year.
3. As I have already run a whole bunch of these numbers, I've seen a couple of trends emerge that I think will prove very useful. First is that players often take a few years to mature. When quantifying their first few seasons, there is very often a 'learning curve' where the total quantified performance builds up to a plateau that they typically reach in their 3rd or 4th season. Second is that there's a downward ramp that starts later in their career until finally their performance trickles off so much that they end up retiring, voluntarily or otherwise. I think that these trends can be used as a predictive tool by understanding typical career trends.
As you'll see over time, the numbers can be presented graphically for each player's career, making it easy to see trending that is upward or downward, as well as highly aberrant years. Much more to come on this.
Saturday, September 12, 2009
Welcome to Consistency Score
I'm using this blog to describe an experimental approach I'm developing to measure the consistency from one season to the next of major league baseball players. You may know me as an author on the Baseball-Reference Blog where I have been contributing stats posts for more than two years.
My qualifications for developing this new stat are somewhat questionable--heh. I've been a baseball fan for a long time and I have an advanced engineering degree. I've even taken a graduate-level statistics & probability class. In reality, though, I'm not much more than a run-of-the-mill above average intellect and I'm light years behind most people with even mild sabermetric tendencies.
However, these shortcomings are why I'm using this blog as a public forum for development of the stat. I guess I could instead have attempted to develop the system entirely in private and then simply unveiled it. But I think I will need the help of some smarter people out there which is why I'm opening it up for public debate at a fairly early stage. I'm looking forward to the debate. Please don't be shy about commenting on any posts and telling me why whatever I've done might be wrong!
My qualifications for developing this new stat are somewhat questionable--heh. I've been a baseball fan for a long time and I have an advanced engineering degree. I've even taken a graduate-level statistics & probability class. In reality, though, I'm not much more than a run-of-the-mill above average intellect and I'm light years behind most people with even mild sabermetric tendencies.
However, these shortcomings are why I'm using this blog as a public forum for development of the stat. I guess I could instead have attempted to develop the system entirely in private and then simply unveiled it. But I think I will need the help of some smarter people out there which is why I'm opening it up for public debate at a fairly early stage. I'm looking forward to the debate. Please don't be shy about commenting on any posts and telling me why whatever I've done might be wrong!
Subscribe to:
Posts (Atom)