Thursday, August 13, 2009

Fielding Independent Pitching

I was working on a longer post about how a pitcher's performance might be affected by the score of the game, or how long the previous half inning took, and I realized that the FIP formula developed by Tom Tango was a bit counter-intuitive to me. I know there are more complex formulas that can perform better, but as a quick predictor of future ERA, FIP seems to be the popular choice. What puzzled me was the use of IP rather than plate appearances (PA) - I thought using IP would bring luck back into the equation, and hence predict lower ERAs for pitchers who happened to have a low BABIP in the preceding season.

So I did some quick analyses using data from 1995-2008. I fit a weighted (based on IP) multiple linear regression model with season ERA as the response and the previous season's HR/IP, (BB+HBP)/IP, and K/IP as predictors. I limited the analysis to pitchers who had pitched at least 130 innings in both the predictor year and the response year. I'm sure the results wouldn't change much if I didn't use the innings quota, but I wanted to focus on starting pitchers because that's the group my next post will be about. The multiple regression assumptions of homoscedasticity and linearity appear to be reasonable:
The estimates (all highly significant) are:

Rounding off to the nearest integer, my formula agrees with FIP. Then I fit the same model, but with HR/PA, (BB+HBP)/PA, and K/PA (using PA as the weight), and got:

The higher R^2 for the innings pitched model tells us that model does a better job of predicting ERA for the subsequent season. Removing balls in play from the regression equation reduced the quality of the fit, so this is evidence that pitchers do have some control over BABIP.

The data used here were obtained from FanGraphs.

Sunday, August 2, 2009

Yankees icing opposing pitchers?

Ever since September 11, 2001, "God Bless America" has been played at the 7th inning stretch of all games at Yankee Stadium, slightly delaying the start of the bottom of the 7th. Other stadiums reserve playing the song for special occasions such as Opening Day, Memorial Day, the 4th of July, and 9/11.

I've often read that this extended break before the bottom of the 7th "ices" the opposing pitcher, much like a timeout before a field goal attempt supposedly distresses a placekicker. This break gets even longer during the postseason when they trot out Dr. Ronan Tynan to sing an extended live version, making the visiting pitcher - and everyone else - stand still (they used to really frown on movement during the song!) while Dr. Tynan sings. I would think this delay might affect a new pitcher coming in from the bullpen more than it would affect the old one coming from the dugout - because the guy in the dugout already sat still for the top of the 7th inning. But either way, the claim is that the Yankees seem to do quite well in the bottom of the 7th at home, especially in the playoffs, since 9/11/2001.

I've seen a couple analyses confirming this effect (I won't name names) that don't really seem to get at the issue. I think the way to simply test for a "God Bless America" effect is to take into account the dynamic annual strength of the Yankee offense, and to limit the analysis to Yankee Stadium games in order to control for park effect. A more complicated analysis could of course take into account opposing pitching strength, injuries, etc.

Rather than looking directly at runs, it would be preferable to look at a sabermetric statistic like base runs, but I'm going to be lazy and look at runs directly. The number of runs the Yankees score in the bottom of the 7th inning of a given game is a count, so it's natural to model it with the Poisson distribution, incorporating an offset for the strength of the Yankee offense. I'll estimate the strength of the offense as the number of runs they score during the whole regular season at home; this actually has a bigger range than I thought, going from 520 runs in 2007 to just 412 runs in 2008 - no doubt partly due to A-Rod's injury. I'll start by considering just the regular season games played between 1995 and 2008. The required quasi-Poisson regression model is:

log(E(runs)) = log(strength) + b*GBA,

where E(runs) denotes the expected value (or mean) of 7th inning runs, and GBA is 1 if the game took place after 9/11/2001, and 0 if before. The results (from R) for b are:
Estimate p-value
-0.002675 0.98

(Dispersion parameter for quasipoisson family taken to be 2.29)

The Yanks are actually scoring at a slightly higher rate in the bottom of the 7th since 9/11/2001 (not shown), but the results above show that after controlling for their increasing offensive strength, they are scoring at a lower rate since the addition of the song! The rate change is highly insignificant though, with a p-value of 0.98. But the mumbling about an unfair advantage appears to be unfounded. What about a playoff advantage though? That's what the opposition really gripes about... it turns out no complicated modeling is necessary to answer that question.

Between 1995 and 2000, the Yanks played 31 home playoff games, and scored 29 7th inning runs, for an average of 0.94. They scored at least once in the 7th 14 times. Between 2001 and 2008, the Yanks played 32 home playoff games, and scored 15 7th inning runs, for an average of 0.47. In those years they scored at least once in the 7th only 9 times!

So it turns out that Dr. Tynan's long performance of "God Bless America" doesn't really throw the opposing pitchers off their game any more than being late in the game in the playoffs at Yankee Stadium has already done. If anything, it seems like it might help the opposition... of course the sample size is pretty small.

Or... maybe the live song has helped the Yanks, but A-Rod has single-handedly mitigated the benefits! :)

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "".