Friday, July 31, 2009


This is the first of possibly many blog entries about statistics in baseball. A bit about me: I'm a Ph.D. student in Statistics at Cornell University, and I've been a baseball fan since I was about five years old I suppose. I started playing fantasy baseball in the early 90's, and I've been at it ever since.

I'm not sure how often I'll have time to write, or how accessible all of my writings will be to the casual statistician, but I hope some people will read what I write here, and maybe I'll actually be filling a niche: baseball analysis with more rigorous statistics than most blogs. It'll be good practice for my own data analysis skills too - something I don't get much of.

I've recently downloaded some data from, and I'm required to cite them with the following statement whenever I use the data: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "". This incredibly useful site has free downloads of the play by play information and summary information for the vast majority of MLB games ever played, and complete data for everything recent.

I'll use the statistical package R for my data analysis. The first question I'll tackle is whether the Yankees' "icing" of opposing pitchers with their long 7th inning stretch is actually effective. Stay tuned.

