Statistics in Baseball: July 2010

Saturday, July 17, 2010

Late-inning defensive substitutions

Sometimes a manager will remove a poor fielder from the game when his team is winning, believing the defensive improvement is enough to offset the loss of this player's bat. This usually happens in the 9th inning. Here's a short discussion about it: Tango analysis. My plan was to consider the following:

win: a binary response for logistic regression indicating if the team held on to win the game,
I.def: an indicator of whether the player was substituted for,
dist: the number of spots in the order until this player is due to bat again. Presumably as this increases, the defensive substitution starts to look better and better,
lead: either one or two runs,
inn: inning - limited to top of the 9th and bottom of the 9th. I didn't want to include the 8th and get into dealing with players who weren't replaced in the 8th but then were in the 9th. The replacement usually happens in the 9th inning anyway,
player: identity of the original player, much like the pinch running analysis, I'm going to lump all the replacements together,
opp: the opposing team,

and to limit the analysis to the AL games to avoid the implications of double switches that happen only in the NL.

To lead into my (short) analysis, I want to continue the selection bias discussion from my previous post. In that situation, requiring all four pairwise combinations of win/loss with pinch runner/no pinch runner to exist for each player created a selection bias and led to an overestimate of the PR main effect. I don't think it affected the analysis very much because my interest was in estimating the sum of this main effect and the average player:PR interaction for 25 good hitters, and this sum should not be affected by a biased main effect - the sum for an individual player estimates his personal PR effect and this is not dependent on any bias in the estimate of the average.

In today's analysis I am looking at defensive replacement by teams in the lead, so they go on to win the game - with or without the replacement - a vast majority of the time. Hence, if I set up the model in the same way as I set up the pinch runner model, with win as the response in a logistic regression, this selection bias will be quite severe. The pairwise combination most often lacking is the loss/replacement, and so many players whose replacement only ever led to victory get deleted from the sample. Proceeding as if everything was normal leads to a hugely negative defensive replacement main effect. Again my interest would be in the sum of this effect and the interaction effects of good hitters, so this is not an insurmountable obstacle.

The obstacle seems to be the lack of repetitions. I was picturing managers replacing their good hitters in close games all the time, and then when extra innings roll around, being left without their good hitters. But as an exploratory analysis I looked at 25 years of data, limited to players hitting in the top five in the order (lower than that the replacement might well be as good as the guy he's replacing) who've been substituted for in the 9th inning with a one or two run lead at least one time (not imposing any condition on having lost at least one game, and so unable to estimate a player effect), and found that out of the 36844 cases remaining, only 629 were defensive replacements - this amounts to less than 2 substitutions per season per team. And of those 36844, only 1490 times did their spot come to bat again - 1428 times with them in it, and 62 times with their replacement in it. At least you'd think that the team wins more often when the player hasn't been replaced, right? 657 wins/1428 games (46%) batting for himself, and 30 wins/62 games (48%) with the replacement - a statistically insignificant difference. Unsurprisingly, when I fit the full logistic regression model described above (again I limited to the top five in the order and fit without player:I.def interaction), the I.def effect was not anywhere close to significant.

This brute force approach to try to get around the noisy data is not going to work here. It certainly looks like the defensive replacement effect, if any, is quite minimal anyway. I'm satisfied for now, but to answer the question properly, I'd have to be able to measure exactly how much a defensive replacement helps the defense, and I don't have the data to do that right now.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Sunday, July 4, 2010

Are pinch runners effective? Part 3 of 3

In this entry I'm going to see whether pinch running is a good tradeoff: increasing the probability of scoring by enough to offset the loss of offense in extra innings. My previous post saw how much pinch running helped the chances of scoring - it was more than I thought. Now we'll complete the second half of the problem.

My original plan was to look only at future innings where the PR (or original hitter if he wasn't pinch run for) came to bat. This would decrease the variance inevitably created by considering opponents' runs or runs in innings when this spot didn't come to bat. Then by finding the probability of this spot returning to the plate in each different situation, I could isolate the negative side of the PR effect, which could then be combined with the estimate of the positive side that I already have. I was hoping this would let me separate the effects of pinch running and the defensive replacement that surely occurs along with it, but these probabilities of returning to bat are going to depend on the opponent's run scoring and this depends a bit on the defense. The main problem though, would be to connect these separate analyses of the good and the bad of pinch running with a sensible variance estimate that allows for the dependence between the two models.

So I scrapped this idea and decided to start by just fitting a multiple logistic regression with win/loss as the binary response, hoping that the noisiness of the data would be offset by the fact that I had the data for games all the way back to 1952. The predictors I considered were:

I.PR: an indicator of whether there was a pinch runner,
lead: the score difference between the two teams (either -1, 0, or 1),
inn: either top 8th, bottom 8th, top 9th, or bottom 9th. I lumped extra innings in with the 9th inning,
outs: the number of outs when the runner first got on base (this is categorical because there's no reason outs would be related linearly to the log odds of scoring),
I.2nd: an indicator of whether the runner was on 2nd base (the alternative is the reference level, 1st base),
player: identity of the original runner (for parsimony I'm lumping all pinch runners together as being fast guys),
opp: the opposing team. This doesn't make much difference because unless you're going to control for which pitcher you're facing, the variance in opposition even within the same team is pretty big due to different pitchers across different generations.

I looked at data for the past 20 years (1990-2009). I used sum contrasts for players (so the average player would have a zero estimate for his effect) and treatment contrasts for all the other predictors. The significant interactions were I.PR:lead, inn:lead, outs:lead, and I.2nd:outs. Notice that there is no data for bottom9th:lead1, so this did not have an estimate; however, that did not preclude proper estimation of the remaining inn:lead effects. This time I wanted to include the I.PR:player interaction because this helps measure exactly what I want to know. Before explaining why this is so, it's best to explain the meaning of the main effects.

The main effect of player measures mostly how good his team is, particularly the players hitting immediately after him. A player effect of zero means his team is about average. The main effect of I.PR measures how much pinch running for the average player increases the log odds of winning. Note that this "average" is not the same as the other one: this one refers more to the speed of player himself. It also incorporates how good a hitter he is - because he might return to the plate in extra innings - but as we will see, it's mostly about his speed, just as the player main effect is mostly about his team rather than himself.

But the interaction of I.PR and player provides information about pinch running for this player relative to the average; if it's positive, a PR should be employed for this player at least as often as the average player; if it's negative, the PR should be employed less often than the average. In fact if the sum of the interaction coefficient and the I.PR coefficient is negative, the log odds of winning the game are decreased when the player is pinch run for.

(Aside: the minor selection bias I mentioned in part 2 is exacerbated here. To estimate the player:I.PR interaction, we need all four pairwise combinations of win/loss with PR/no PR for each player, so any player who is lacking one of these combinations is deleted. By far the most likely to be missing are the two involving PR. About 55% of these deletions were loss/PR - a higher percentage of losses than the true PR population contains - so the PR main effect in my analysis has a positive bias. But my goal is to look at player-specific PR effects, and those are based on the sum of the PR main effect with the interaction term, a sum which should be invariant to any bias in the main effect - if the main effect is too high, the interaction estimate will just be lower to balance it out. I wouldn't expect the parameter estimates for the effects not involving PR to be biased.)

I don't want to consider just one player at a time because the variance of the interaction coefficient estimates is too large to make an informed conclusion. But what I can do is average the interaction effects of many players together. Without looking at the data, I picked a list of 25 players who I thought were good hitters, but in general pretty slow runners:

Berkman, Lance
Bonds, Barry
Cabrera, Miguel
Dunn, Adam
Giambi, Jason
Guerrero, Vladimir
Gwynn, Tony
Helton, Todd
Holliday, Matt
Howard, Ryan
Jones, Chipper
Kent, Jeff
Lee, Carlos
McGriff, Fred
McGwire, Mark
Ordonez, Magglio
Ortiz, David
Palmeiro, Rafael
Piazza, Mike
Ramirez, Manny
Rodriguez, Ivan
Sheffield, Gary
Sosa, Sammy
Thomas, Frank
Youkilis, Kevin

Most of these guys have been pinch run for tens of times, and not pinch run for over 100 times. If their average interaction coefficient estimate was significantly negative, that would tell me they should be pinch run for at least as often as the average player. (I'm being vague because we don't know yet if the average player should be pinch run for.) But their average I.PR:player coefficient estimate is 0.26 with a standard error of 0.18. I guess I could take an even larger sample of good, slow hitters, but even with the big standard error, I'm convinced that this interaction estimate has a lot more to do with speed than hitting ability - the results from part 2 really did show that having a PR helps a lot. The estimate of the main effect of PR is also positive, but as I mentioned earlier, it's biased, so I won't draw any conclusions about the average players here (I did do a separate analysis without interaction, and hence without a noticeable selection bias, and the PR effect for the average player was still significant).

The table below is based on the average of the 25 aforementioned players. It gives the probabilities of winning in each of the 66 different situations (don't worry, I used a loop in R to make the html code so I didn't have to type it all). The situation column is as follows: lead, inning, base, outs. The p-values are for the 2-sided test between the two probability estimates.

situation	P(win) no PR	P(win) PR	p-value
-1,t8,1,0	0.325	0.379	0.217
-1,t8,1,1	0.232	0.277	0.226
-1,t8,1,2	0.145	0.177	0.236
-1,t8,2,0	0.399	0.457	0.210
-1,t8,2,1	0.290	0.341	0.221
-1,t8,2,2	0.156	0.190	0.235
-1,b8,1,0	0.442	0.501	0.206
-1,b8,1,1	0.332	0.386	0.217
-1,b8,1,2	0.218	0.261	0.228
-1,b8,2,0	0.521	0.580	0.200
-1,b8,2,1	0.401	0.459	0.210
-1,b8,2,2	0.233	0.278	0.227
-1,t9,1,0	0.216	0.259	0.227
-1,t9,1,1	0.148	0.180	0.234
-1,t9,1,2	0.089	0.110	0.240
-1,t9,2,0	0.275	0.325	0.222
-1,t9,2,1	0.190	0.229	0.230
-1,t9,2,2	0.096	0.118	0.240
-1,b9,1,0	0.296	0.348	0.219
-1,b9,1,1	0.209	0.251	0.228
-1,b9,1,2	0.129	0.158	0.236
-1,b9,2,0	0.367	0.423	0.213
-1,b9,2,1	0.263	0.311	0.223
-1,b9,2,2	0.139	0.170	0.236
0,t8,1,0	0.551	0.662	0.009
0,t8,1,1	0.468	0.583	0.011
0,t8,1,2	0.401	0.516	0.013
0,t8,2,0	0.628	0.729	0.007
0,t8,2,1	0.543	0.654	0.009
0,t8,2,2	0.421	0.537	0.013
0,b8,1,0	0.723	0.806	0.006
0,b8,1,1	0.652	0.749	0.007
0,b8,1,2	0.588	0.694	0.008
0,b8,2,0	0.782	0.851	0.005
0,b8,2,1	0.716	0.801	0.006
0,b8,2,2	0.608	0.712	0.008
0,t9,1,0	0.565	0.674	0.009
0,t9,1,1	0.482	0.597	0.011
0,t9,1,2	0.415	0.530	0.013
0,t9,2,0	0.641	0.740	0.007
0,t9,2,1	0.557	0.667	0.009
0,t9,2,2	0.435	0.551	0.012
0,b9,1,0	0.739	0.819	0.006
0,b9,1,1	0.670	0.764	0.007
0,b9,1,2	0.607	0.711	0.008
0,b9,2,0	0.796	0.861	0.005
0,b9,2,1	0.733	0.814	0.006
0,b9,2,2	0.627	0.728	0.007
1,t8,1,0	0.796	0.879	0.001
1,t8,1,1	0.742	0.843	0.001
1,t8,1,2	0.731	0.835	0.001
1,t8,2,0	0.843	0.909	0.001
1,t8,2,1	0.795	0.879	0.001
1,t8,2,2	0.747	0.846	0.001
1,b8,1,0	0.917	0.954	0.001
1,b8,1,1	0.891	0.938	0.001
1,b8,1,2	0.885	0.935	0.001
1,b8,2,0	0.939	0.966	0.001
1,b8,2,1	0.917	0.954	0.001
1,b8,2,2	0.894	0.940	0.001
1,t9,1,0	0.877	0.930	0.001
1,t9,1,1	0.840	0.907	0.001
1,t9,1,2	0.832	0.902	0.001
1,t9,2,0	0.907	0.948	0.001
1,t9,2,1	0.876	0.929	0.001
1,t9,2,2	0.843	0.909	0.001

A couple interesting things I notice:

the win probability estimates when trailing by one or when tied are higher with a runner on 1st and no out than with a runner on 2nd and one out. Newsflash: bunting is dumb in general, even when you only need one run.
some of the PR effects are shockingly large compared to what I had estimated in part 2 for the change in probability of the run scoring. I can only say that I double checked these estimates, and that the estimates from part 2 hadn't allowed for the slowness of the hitter. The other difference I can think of is that (for convenience) my code counted the runner as having scored even if he'd been erased by a fielder's choice and a subsequent runner scored. If this was more common in non-PR situations, it could have led to an understatement of the true PR effect in part 2.

In conclusion, I am convinced that pinch running in the late innings is an effective strategy. This was a bit surprising to me, but apparently the small chance of extra innings (and the new, poorer hitter being unable to contribute to a run that the good hitter could have) is outweighed by the relatively large improvement in run scoring in the current inning. The effect seems to get even bigger when the hitter was slow, even if he's a great hitter.

I think my next entry will be about late-inning defensive replacements.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Saturday, July 3, 2010

Are pinch runners effective? Part 2 of 3

For the introduction (part 1), see my previous post. In this entry I'm going to estimate the increase in the probability of scoring provided by a pinch runner (PR). My next post (part 3) will attempt to answer the ultimate question: is this increase in probability enough to justify weakening the offense for future innings.

After some R coding to count the necessary information, I fit a multiple logistic regression model with:

I.run: binary response, the indicator of whether the run scored,
I.PR: an indicator of whether there was a pinch runner,
lead: the score difference between the two teams (either -1, 0, or 1). Maybe somewhat surprisingly this makes a difference,
outs: the number of outs when the runner first got on base (this is categorical because there's no reason outs would be related linearly to the log odds of scoring),
I.2nd: an indicator of whether the runner was on 2nd base (the alternative is the reference level, 1st base),
player: identity of the original runner (for parsimony I'm lumping all pinch runners together as being fast guys),

while limiting the data as described in the introduction.

I'm making the assumption that there are only three outcomes stemming from possibly pinch running: the run won't score either way, it will score either way, or the PR would score when the original player would've been stranded. I'm ignoring the possible effects - that is, on subsequent runs in this inning that are less likely because of the extra out - of the slow runner being more susceptible to getting thrown out on the bases (by a force play or otherwise). I imagine the frequency of being thrown out when the PR wouldn't have been is low compared to the frequency of times this slow runner just gets held up by the 3rd base coach when the PR wouldn't have been. Regardless, it might get too complicated figuring out how much this extra out affects the distribution of run scoring in the inning. However if I fail to find that pinch running is beneficial (in part 3), I may have to revisit this assumption. I can think of two different ways to weaken it:

instead of the binary response, the response could be a count of the number of runs scored in the inning subsequent to this runner reaching base. I balked at that for this entry because in an entry last year, I had trouble getting R to do multinomial regression with covariance matrices provided - but that data set was bigger, so it may be worth a shot in my next post,
instead of separating steps 1 and 2, just fit a binary logistic regression with win/loss as the response. This gets at the question directly, but I'm worried that the response will be pretty noisy.

I considered the past 15 seasons (1995-2009), where a total of 720 different players satisfied the following two conditions: (i) they were both pinch run for and allowed to run for themselves in the 8th inning or later of a tie or one-run game, and (ii) they scored or had their PR score. The second condition removes about half a percent of the population left after the first condition, and I think it creates a minor selection bias - non-PR situations are being deleted more than PR situations (because there are many more non-PR situations), and the proportion of deletions corresponding to successes is probably not exactly the same as the proportion left in the sample. But without the second condition it would be impossible to estimate a player effect, so I required it anyway.

There are a number of interactions that could potentially enter the logistic model, but only two interactions were close to significant (for more on the I.PR:player interaction see the end of this post); they are shown below. I used the default treatment contrasts in R for I.PR, lead, outs, and I.2nd, but sum contrasts for the players, so the intercept corresponds to the log odds of scoring for the average player, running for himself, after a lead-off single when trailing by one run. The parameter estimates and their p-values are listed below; the residual deviance is 32836 on 27742 degrees of freedom, making the overdispersion parameter estimate less than 1.09, so I didn't bother allowing for overdispersion. It would not have changed any of the conclusions anyway. I've included David Ortiz's estimate below because he will come up again in a moment.

intercept	-0.430	2e-16 ***
I.PR	0.086	0.161
lead0	0.104	0.004 **
lead1	0.216	2.48e-07 ***
outs1	-0.661	2e-16 ***
outs2	-1.476	2e-16 ***
I.2nd	0.774	2e-16 ***
outs1:I.2nd	-0.207	0.011 *
outs2:I.2nd	-0.305	0.002 **
I.PR:lead0	0.161	0.046 *
I.PR:lead1	0.146	0.209
Ortiz	0.674	0.010 *

The coefficient estimates of interest are the ones concerning I.PR. The main effect for I.PR says that exp(0.086) is the multiplicative change in the odds of scoring created by pinch running when lead=-1, and similarly adding the main effect to each interaction term allows estimation of the multiplicative change in the odds when lead=0 or 1. I expect the more meaningful number in part 3 will be the additive change in probability - this will be different depending on the situation. The following table compares the probability of scoring a run with no PR to the probability of scoring a run with a PR in all 18 situations. This is done for an average player. A Taylor series expansion was used to estimate the variance of the difference of the probability estimates, and normality was assumed to calculate the p-values for the one-sided tests of those differences.

situation	P(run) no PR	P(run) PR	p-value
lead=-1,base=1,outs=0	0.394	0.415	0.081
lead=-1,base=1,outs=1	0.251	0.268	0.082
lead=-1,base=1,outs=2	0.129	0.139	0.084
lead=-1,base=2,outs=0	0.585	0.606	0.080
lead=-1,base=2,outs=1	0.372	0.392	0.082
lead=-1,base=2,outs=2	0.192	0.206	0.084
lead=0,base=1,outs=0	0.419	0.480	0.000
lead=0,base=1,outs=1	0.271	0.323	0.000
lead=0,base=1,outs=2	0.142	0.174	0.000
lead=0,base=2,outs=0	0.610	0.667	0.000
lead=0,base=2,outs=1	0.396	0.457	0.000
lead=0,base=2,outs=2	0.209	0.252	0.000
lead=1,base=1,outs=0	0.447	0.504	0.011
lead=1,base=1,outs=1	0.294	0.344	0.013
lead=1,base=1,outs=2	0.156	0.189	0.016
lead=1,base=2,outs=0	0.636	0.688	0.009
lead=1,base=2,outs=1	0.423	0.481	0.012
lead=1,base=2,outs=2	0.228	0.271	0.015

The following table is the same as the previous one, except it's for David Ortiz instead of the average player. He often hit prior to Manny Ramirez or Kevin Youkilis (good hitters).

situation	P(run) no PR	P(run) PR	p-value
lead=-1,base=1,outs=0	0.561	0.582	0.080
lead=-1,base=1,outs=1	0.397	0.418	0.082
lead=-1,base=1,outs=2	0.226	0.241	0.086
lead=-1,base=2,outs=0	0.735	0.751	0.082
lead=-1,base=2,outs=1	0.537	0.559	0.080
lead=-1,base=2,outs=2	0.318	0.337	0.083
lead=0,base=1,outs=0	0.586	0.644	0.000
lead=0,base=1,outs=1	0.422	0.483	0.000
lead=0,base=1,outs=2	0.244	0.293	0.000
lead=0,base=2,outs=0	0.754	0.797	0.000
lead=0,base=2,outs=1	0.563	0.623	0.000
lead=0,base=2,outs=2	0.341	0.398	0.000
lead=1,base=1,outs=0	0.613	0.666	0.010
lead=1,base=1,outs=1	0.450	0.507	0.011
lead=1,base=1,outs=2	0.266	0.313	0.016
lead=1,base=2,outs=0	0.774	0.812	0.012
lead=1,base=2,outs=1	0.590	0.645	0.010
lead=1,base=2,outs=2	0.366	0.422	0.013

The Ortiz coefficient's standard error was 0.263, so these probability estimates may be larger than the true probabilities: and surely they are because even Manny Ramirez cannot hit about .400, which is what the two-out numbers above are suggesting, with RISP in the long term.

We can see from the low p-values in both tables that pinch running helps increase the chance of scoring, at least when the leading or tied. The PR effect is actually much bigger than I imagined it would be. Other than the large probability increases from the 1st table to the 2nd, there are a couple of interesting things. First, as the I.PR*lead interaction suggested, the PR effect is much smaller when the team is trailing; the raw probability of scoring is also smaller when trailing. I think the latter is easily explained by the fact that the opposing team's best pitchers tend to be their set-up man and closer, and these are the pitchers you face when trailing by one run in the 8th or 9th inning. Another less likely explanation might be that if you are losing by one run, you are not hitting as well on the day, and hence less likely to score. This gets into the "hot hand effect" though, and so I think the likely explanation is the opposing pitching.

I don't think the smaller PR effect in games where the team is trailing is as intuitive, but it probably also has a lot to do with facing the opposing team's best pitching: the speed of your base runner is not important if the subsequent hitters are not getting hits. I actually fit a poisson regression with expected runs scored subsequent to the runner reaching base as the response, and interestingly the expected runs scored with the PR when trailing by one were lower than the expected runs with no PR. The effect was extremely small and not statistically significant, but with the size of my sample I wouldn't expect to see that. I wonder if it has to do with a different strategy being employed when trailing - the PR tries to steal 2nd base, or the manager tries to bunt him over to 2nd base. Either of these moves certainly decrease the expected runs scored in the inning (although possibly increasing the chance of scoring at least one run).

By not allowing for an interaction between I.PR and player, which would measure how fast the original base runner is, I've basically assumed that David Ortiz has the same speed as the average player (at least, average among those who have ever pinch run for - which is almost everybody at one time or another). But the variation in each of these interaction estimates would be even bigger than the variation we're now seeing in the player estimates, so I didn't bother with this here; I will have to include this is my final analysis in part 3 though, perhaps averaging several players together to more precisely estimate an overall effect of pinch running for good hitters. Anyway, because Ortiz is unusually slow, you can probably safely take away a little bit from Ortiz's probabilities and add a little bit to his PR's.

This wraps up step 1. I've had to make some simplifications and assumptions to fit this logistic model (some of which I might have to try to relax in my next post), but at least the data have allowed me to show that pinch running significantly increases the chance of scoring.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".