Statistics in Baseball: 2010

Tuesday, October 19, 2010

Big changes!

I've got a great new job, starting in January, for a sports betting company in London. This means that the blog entries will probably stop for good because my baseball modelling efforts will go into the job. I may still post once in a while about other sports that my employers don't bet on - like maybe curling!

Monday, October 11, 2010

The Home Run Derby effect?

Do players who participate in the All-Star Game Home Run Derby screw up their swing, go into a slump, and have a poor second half? I've heard this one from the talking heads before, and it sounds completely false to me. Mark McGwire used to put on a show in batting practice in 1998, and it didn't stop him from hitting 70 home runs.

The Home Run Derby data were found from the MLB website, but they weren't in CSV or tab separated format, so I had to do some manipulation. I would have liked to get data on just the few games following the HR Derby to check for slumps, but I settled for the first and second half splits (as determined by the All-Star break) from Baseball-Reference. I used the years 2003-2009.

OPS is a good measure of how effective a hitter is, so I thought it would be best to compare pre- and post-break numbers in terms of OPS (I tried some other things and they led to similar conclusions). Baseball-Reference has a statistic called sOPS+ which measures a player's OPS relative to the league. This controls for season, but since I was looking at first and second half differences, this wasn't too important. I tried it anyway, and it gave almost the exact same results as OPS, so I stuck with OPS.

Although some players appeared in more than one HR Derby between 2003 and 2009, I assumed that the 56 differences between pre-break OPS and post-break OPS were independent. The differences looked Gaussian, and the average pre-break OPS was .958 and the average post-break OPS was .924, leading to a one-sided p-value of 0.02 in the paired t-test - so it's true, the participants do worse after the break! This idea is furthered by the fact that the mean career pre- and post-break OPS for the players are not significantly different - the decrease seems to happen specifically in the year the players compete in the HR Derby.

But... how are hitters selected for the HR Derby? By having a very good first half. The hitters participating are ones who have often done unusually well in the first half, and were heading for a drop-off in the second half whether they took part in the derby or not. I can think of two ways to get around this and answer the question of whether the HR Derby causes the poorer second half. One is to compare the second half OPS in HR Derby years to the second half OPS in non-HR Derby years, and the other is to see if players who take part in more rounds of the derby have a bigger second half drop-off than players who are eliminated early. (I could also look at the second half drop-off for players in the All-Star Game who weren't in the HR Derby, but it was enough of a pain getting the data for just these players, so I'll try to avoid this approach.)

The mean career second half OPS of the 56 HR Derby hitters is .894, and in HR Derby years it is .924. This is still a bit unsatisfactory because the HR Derby year is presumably in the prime of their career, so let's try the second way. Consider the number of swings taken in the competition by each player; this is equal to ten times the number of rounds they were in, plus their HR total. Fitting a linear regression of decrease in OPS on number of HR Derby swings, it is apparent that the more swings the player takes, the less the pre- and post-break OPS difference is, i.e. the opposite of the proposed effect. So I'm pretty comfortable writing the poorer second halves off to selection bias.

Thanks to Bret Hanlon for the idea for this post.

Sunday, October 10, 2010

Starting pitchers in their last inning of the game

In my previous post I compared the FIP for starting pitchers in their final inning of the game to the FIP of their team's bullpen, and found that they're being left in the game too long. Today I'm just going to present three tables that I created from the R code used to make that post. I limited the data to the AL in 2009 in the previous post because the bullpen data had to be collated manually, but no such barrier exists this time, so I used both the AL and NL from 2002-2009, limiting the data to pitchers who started at least 50 games over that time.

This first table looks at pitchers who had the most significant drop-off (all of them had p-values less than 10^(-6)) from their non-final inning numbers to their final inning numbers. 233 of the 241 of the pitchers in the sample had a significant drop-off (p-values less than 0.05), but these were the most severe. I don't know how much this difference really means because the important number for managerial decisions is the late FIP, but many of these guys are good pitchers, and are probably pitching in close games pretty often (unlike somebody who has a really high FIP throughout the game), and their team would benefit if the manager got them out one inning early. Of course knowing when the pitcher is going to start getting knocked around in a given game is impossible - sometimes they might be getting pulled in the 8th, other times in the 7th, etc. However it seems managers should be extra aware about these pitchers, and get them out of the game as soon as they're showing even a slight decrease in velocity.

pitcher	early FIP	late FIP
Burnett, A.J.	3.351	5.907
Byrd, Paul	4.182	7.196
Fogg, Josh	4.529	8.090
Garland, John	4.210	7.301
Hernandez, Livan	4.039	6.670
Lackey, John	3.538	5.755
Lohse, Kyle	4.332	7.152
Meche, Gil	3.940	6.950
Ortiz, Russ	4.183	7.639
Pavano, Carl	3.536	7.049
Perez, Oliver	4.176	7.506
Robertson, Nate	4.160	7.473
Santana, Johan	2.915	5.337
Silva, Carlos	4.143	7.384
Trachsel, Steve	4.444	7.597
Wakefield, Tim	4.149	7.193

This table is less interesting, but these were the only eight pitchers who didn't have a significant last inning FIP increase.

pitcher	early FIP	late FIP	p-value
Williams, David	5.131	6.063	0.187
Smoltz, John	3.114	3.827	0.089
Litsch, Jesse	4.510	5.820	0.085
Ryan, Brendan	4.148	5.208	0.073
Kuroda, Hiroki	3.400	4.561	0.071
Santos, Victor	4.556	5.713	0.066
Galarraga, Armando	4.760	6.435	0.056
Hammel, Jason	3.976	5.410	0.050

The final table shows the nine pitchers who had a last inning FIP over 9.00. I don't think any of them are still starting games. Rick Reed had a good career and my data set just caught the tail end of it. Most have been tried as relief pitchers, and Darren Oliver has actually become a pretty good one.

player	early FIP	late FIP
James, Chuck	4.240	10.827
McClung, Seth	4.669	10.280
Oliver, Darren	5.053	9.661
Kinney, Matt	4.037	9.640
Waechter, Doug	4.830	9.609
Reed, Rick	3.531	9.534
Helling, Rick	4.162	9.433
Owings, Micah	4.872	9.243
Mays, Joe	4.595	9.032

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Wednesday, September 1, 2010

When to remove the starting pitcher

I've always thought managers are slow to go to the bullpen. It seems like they usually wait for the starting pitchers to get into trouble instead of trying to get them out before trouble starts, so I decided to look at the data. My strategy was to compare the last inning of starting pitching to the bullpen's average numbers, and look for significant differences. Getting bullpen numbers for each team isn't easy, and I had to copy/paste each team individually from different pages at Baseball-Reference - not wanting to do this for a bunch of different seasons, I just used the 2009 AL numbers. I would speculate that the NL managers are a bit better at pulling the starter because sometimes they are nudged to do so when he comes up to bat. As it turns out, using only 2009 gives plenty of data to see that managers do not pull the starters in a timely manner. In fact, for all 14 teams there is a significant difference between the bullpen and the last inning of their starting pitching.

Rather than using ERA to compare SPs and RPs, I used FIP. I added HBP to BB in this formula. The additive term seems to have changed from 3.20 to 3.10 since I wrote my entry on closers. The advantage of using FIP is twofold: it takes a lot of the luck out of the equation, and actually predicts future ERA better than past ERA does; and inherited runners are not important because we're just considering HR, BB, K, and IP. It's easy to estimate the standard deviation of FIP because it can be written as a function of multinomial probabilities. This is important here because I want to be able to tell if the bullpen's FIP is significantly better than the starters' FIP.

I limited the data to pitchers who started at least 15 games (64 pitchers qualify), figuring by that point the manager should have a good idea of when the pitcher is tiring. Of course I have the advantage of looking at the whole season's data to see where the differences lie - at the beginning of the season, the manager may not know how good his bullpen will be, or how his new pitchers behave in the late innings, etc. But as we'll see from the huge differences - SPs should be removed sooner rather than later!

The following table shows the team's average SP FIP for their last inning in the second column, the bullpen's FIP in the third column, and the p-value testing whether RP FIP is less than SP FIP in the final column.

team	SP FIP	RP FIP	p-value
ANA	7.278	4.274	0.000
BAL	7.522	4.557	0.001
BOS	6.334	4.154	0.001
CHA	7.209	3.927	0.000
CLE	7.851	4.686	0.000
DET	6.173	4.666	0.024
KCA	6.737	4.586	0.003
MIN	9.148	4.322	0.000
NYA	5.438	4.329	0.044
OAK	6.265	3.349	0.000
SEA	5.673	4.352	0.046
TBA	7.011	4.487	0.002
TEX	6.402	4.057	0.000
TOR	8.500	4.211	0.000

Joe Girardi of the Yankees and Don Wakamatsu of the Mariners were the best at removing their starters before they got knocked around. But they still seem to leave them in too long, and actually they might only be the best because both teams had four qualifying SP, all of whom were pretty good - notice they have the best SP numbers of any teams.

Most teams actually have slightly better RP numbers than SP numbers (even when removing the last inning pitched for all the SP); I guess this is due to being able to throw harder when you throw fewer pitches (I've also heard pitchers are worse the second time through the order - something to investigate in the future). The differences are quite small, but still, if the managers are aware of this, maybe their bullpens are already operating at their innings limit, and can't come in one inning earlier. Or at least all their good RP are operating at their innings limit. I bet some of the difference that shows up in the table is unavoidable, but it sure seems like it would help to have your best AAA pitcher on the roster, instead of another backup hitter, to eat up some of the bad SP innings.

Next time I'll look at some of the pitchers who have the biggest dropoff in FIP from the first several innings to their final inning and at the ones who are the worst in their final inning. Without needing bullpen average FIPs here, I'll be able to consider several seasons at once.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Sunday, August 1, 2010

Do knuckleballers induce slumps?

The knuckleball is a very unusual pitch. It's thrown significantly slower than every other pitch, and it's been said to mess with a hitter's timing. Hitters seem to do fine against the knuckleball (KB), but I've heard people say that they'll go into a slump after facing the knuckler. Today I thought I'd investigate. Retrosheet's game logs provide the starting pitcher for every game as well as the basic offensive statistics needed to compute on-base percentage (OBP) and slugging percentage (SLG), so getting data was easy.

I picked out six knuckleballers, either famous or contemporary: Tom Candiotti, Charlie Hough, Steve Sparks, Tim Wakefield, Phil Niekro, and Hoyt Wilhelm. There are some other famous ones, like Joe Niekro, but they didn't throw it for their whole career. I searched through the data to find any games that one of these pitchers started, and then looked at their opponent's OBP and SLG in their kth game after the knuckler game (k=1,...,3 is all that was needed), and paired that with the team's average OBP and SLG over the whole season, with the intention of studying the paired differences to look for a post-KB game effect. I assumed that the differences were independent - which is at least approximately true.

To calculate the team's overall strength I had to use the unweighted average OBP and SLG, i.e. I averaged the OBP and SLG from all 162 games rather than adding all the count data together to find the weighted average. This was necessary because the weighted average tends to be higher than the value for a single game.

The differences between the season average and the kth game average have a beautiful bell-shaped curve (for both OBP and SLG), so I used two paired t-tests to see if the differences were significantly greater than zero. Since each difference is actually based on the difference of two averages, both of which might have a slightly different sample size each time (at-bat totals won't be exactly the same every game), I could probably be more efficient by assigning some weights to the differences, but I highly doubt this would make a non-negligible difference in the p-values.

The third column in the following two tables shows the average for the kth game after the knuckleball game, k=1,...,3, with the average season total for the teams in the sample in the second column. The p-values are testing OBP>post-KB OBP and SLG>post-KB SLG respectively, and they are based on a paired t-test.

game	OBP	post-KB OBP	p-value
1	0.3206	0.3195	0.2566
2	0.3206	0.3226	0.8676
3	0.3206	0.3198	0.3233

game	SLG	post-KB SLG	p-value
1	0.3934	0.3869	0.0174
2	0.3934	0.3939	0.5696
3	0.3934	0.394	0.5872

The knuckleball actually does seem to sap the team's power the day after they face it. Actually most of the next day woes are due to Candiotti's and Wakefield's effects, in particular Wakefield's. His tables are below:

game	OBP	post-KB OBP	p-value
1	0.3261	0.3163	0.0093
2	0.3261	0.3256	0.4509
3	0.3261	0.3224	0.1968

game	SLG	post-KB SLG	p-value
1	0.4164	0.3958	0.0021
2	0.4164	0.4168	0.5205
3	0.4164	0.4081	0.1328

The overall effects are higher in Wakefield's tables because he's pitched in an offensive era. Hitters really do seem significantly worse than normal the day after they face him - even with the Bonferroni correction, the SLG decrease is significant, and the OBP decrease is marginally significant. The unweighted averages of OBP and SLG are .316 and .398 in games Wakefield pitches - actually quite close to the averages of the opposing team on the following day, and both lower than the overall average. Hough, Niekro, and Wilhelm are the superior pitchers, but maybe they didn't throw the knuckler as often as Wakefield does, and hence don't screw up the hitters' next day timing as much? Or maybe it has something to do with contemporary hitters not seeing the knuckler as often as guys in the past may have? Steve Sparks doesn't cause this next day drop-off though.

Thanks to Ben Shaby for suggesting this idea. The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Saturday, July 17, 2010

Late-inning defensive substitutions

Sometimes a manager will remove a poor fielder from the game when his team is winning, believing the defensive improvement is enough to offset the loss of this player's bat. This usually happens in the 9th inning. Here's a short discussion about it: Tango analysis. My plan was to consider the following:

win: a binary response for logistic regression indicating if the team held on to win the game,
I.def: an indicator of whether the player was substituted for,
dist: the number of spots in the order until this player is due to bat again. Presumably as this increases, the defensive substitution starts to look better and better,
lead: either one or two runs,
inn: inning - limited to top of the 9th and bottom of the 9th. I didn't want to include the 8th and get into dealing with players who weren't replaced in the 8th but then were in the 9th. The replacement usually happens in the 9th inning anyway,
player: identity of the original player, much like the pinch running analysis, I'm going to lump all the replacements together,
opp: the opposing team,

and to limit the analysis to the AL games to avoid the implications of double switches that happen only in the NL.

To lead into my (short) analysis, I want to continue the selection bias discussion from my previous post. In that situation, requiring all four pairwise combinations of win/loss with pinch runner/no pinch runner to exist for each player created a selection bias and led to an overestimate of the PR main effect. I don't think it affected the analysis very much because my interest was in estimating the sum of this main effect and the average player:PR interaction for 25 good hitters, and this sum should not be affected by a biased main effect - the sum for an individual player estimates his personal PR effect and this is not dependent on any bias in the estimate of the average.

In today's analysis I am looking at defensive replacement by teams in the lead, so they go on to win the game - with or without the replacement - a vast majority of the time. Hence, if I set up the model in the same way as I set up the pinch runner model, with win as the response in a logistic regression, this selection bias will be quite severe. The pairwise combination most often lacking is the loss/replacement, and so many players whose replacement only ever led to victory get deleted from the sample. Proceeding as if everything was normal leads to a hugely negative defensive replacement main effect. Again my interest would be in the sum of this effect and the interaction effects of good hitters, so this is not an insurmountable obstacle.

The obstacle seems to be the lack of repetitions. I was picturing managers replacing their good hitters in close games all the time, and then when extra innings roll around, being left without their good hitters. But as an exploratory analysis I looked at 25 years of data, limited to players hitting in the top five in the order (lower than that the replacement might well be as good as the guy he's replacing) who've been substituted for in the 9th inning with a one or two run lead at least one time (not imposing any condition on having lost at least one game, and so unable to estimate a player effect), and found that out of the 36844 cases remaining, only 629 were defensive replacements - this amounts to less than 2 substitutions per season per team. And of those 36844, only 1490 times did their spot come to bat again - 1428 times with them in it, and 62 times with their replacement in it. At least you'd think that the team wins more often when the player hasn't been replaced, right? 657 wins/1428 games (46%) batting for himself, and 30 wins/62 games (48%) with the replacement - a statistically insignificant difference. Unsurprisingly, when I fit the full logistic regression model described above (again I limited to the top five in the order and fit without player:I.def interaction), the I.def effect was not anywhere close to significant.

This brute force approach to try to get around the noisy data is not going to work here. It certainly looks like the defensive replacement effect, if any, is quite minimal anyway. I'm satisfied for now, but to answer the question properly, I'd have to be able to measure exactly how much a defensive replacement helps the defense, and I don't have the data to do that right now.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Sunday, July 4, 2010

Are pinch runners effective? Part 3 of 3

In this entry I'm going to see whether pinch running is a good tradeoff: increasing the probability of scoring by enough to offset the loss of offense in extra innings. My previous post saw how much pinch running helped the chances of scoring - it was more than I thought. Now we'll complete the second half of the problem.

My original plan was to look only at future innings where the PR (or original hitter if he wasn't pinch run for) came to bat. This would decrease the variance inevitably created by considering opponents' runs or runs in innings when this spot didn't come to bat. Then by finding the probability of this spot returning to the plate in each different situation, I could isolate the negative side of the PR effect, which could then be combined with the estimate of the positive side that I already have. I was hoping this would let me separate the effects of pinch running and the defensive replacement that surely occurs along with it, but these probabilities of returning to bat are going to depend on the opponent's run scoring and this depends a bit on the defense. The main problem though, would be to connect these separate analyses of the good and the bad of pinch running with a sensible variance estimate that allows for the dependence between the two models.

So I scrapped this idea and decided to start by just fitting a multiple logistic regression with win/loss as the binary response, hoping that the noisiness of the data would be offset by the fact that I had the data for games all the way back to 1952. The predictors I considered were:

I.PR: an indicator of whether there was a pinch runner,
lead: the score difference between the two teams (either -1, 0, or 1),
inn: either top 8th, bottom 8th, top 9th, or bottom 9th. I lumped extra innings in with the 9th inning,
outs: the number of outs when the runner first got on base (this is categorical because there's no reason outs would be related linearly to the log odds of scoring),
I.2nd: an indicator of whether the runner was on 2nd base (the alternative is the reference level, 1st base),
player: identity of the original runner (for parsimony I'm lumping all pinch runners together as being fast guys),
opp: the opposing team. This doesn't make much difference because unless you're going to control for which pitcher you're facing, the variance in opposition even within the same team is pretty big due to different pitchers across different generations.

I looked at data for the past 20 years (1990-2009). I used sum contrasts for players (so the average player would have a zero estimate for his effect) and treatment contrasts for all the other predictors. The significant interactions were I.PR:lead, inn:lead, outs:lead, and I.2nd:outs. Notice that there is no data for bottom9th:lead1, so this did not have an estimate; however, that did not preclude proper estimation of the remaining inn:lead effects. This time I wanted to include the I.PR:player interaction because this helps measure exactly what I want to know. Before explaining why this is so, it's best to explain the meaning of the main effects.

The main effect of player measures mostly how good his team is, particularly the players hitting immediately after him. A player effect of zero means his team is about average. The main effect of I.PR measures how much pinch running for the average player increases the log odds of winning. Note that this "average" is not the same as the other one: this one refers more to the speed of player himself. It also incorporates how good a hitter he is - because he might return to the plate in extra innings - but as we will see, it's mostly about his speed, just as the player main effect is mostly about his team rather than himself.

But the interaction of I.PR and player provides information about pinch running for this player relative to the average; if it's positive, a PR should be employed for this player at least as often as the average player; if it's negative, the PR should be employed less often than the average. In fact if the sum of the interaction coefficient and the I.PR coefficient is negative, the log odds of winning the game are decreased when the player is pinch run for.

(Aside: the minor selection bias I mentioned in part 2 is exacerbated here. To estimate the player:I.PR interaction, we need all four pairwise combinations of win/loss with PR/no PR for each player, so any player who is lacking one of these combinations is deleted. By far the most likely to be missing are the two involving PR. About 55% of these deletions were loss/PR - a higher percentage of losses than the true PR population contains - so the PR main effect in my analysis has a positive bias. But my goal is to look at player-specific PR effects, and those are based on the sum of the PR main effect with the interaction term, a sum which should be invariant to any bias in the main effect - if the main effect is too high, the interaction estimate will just be lower to balance it out. I wouldn't expect the parameter estimates for the effects not involving PR to be biased.)

I don't want to consider just one player at a time because the variance of the interaction coefficient estimates is too large to make an informed conclusion. But what I can do is average the interaction effects of many players together. Without looking at the data, I picked a list of 25 players who I thought were good hitters, but in general pretty slow runners:

Berkman, Lance
Bonds, Barry
Cabrera, Miguel
Dunn, Adam
Giambi, Jason
Guerrero, Vladimir
Gwynn, Tony
Helton, Todd
Holliday, Matt
Howard, Ryan
Jones, Chipper
Kent, Jeff
Lee, Carlos
McGriff, Fred
McGwire, Mark
Ordonez, Magglio
Ortiz, David
Palmeiro, Rafael
Piazza, Mike
Ramirez, Manny
Rodriguez, Ivan
Sheffield, Gary
Sosa, Sammy
Thomas, Frank
Youkilis, Kevin

Most of these guys have been pinch run for tens of times, and not pinch run for over 100 times. If their average interaction coefficient estimate was significantly negative, that would tell me they should be pinch run for at least as often as the average player. (I'm being vague because we don't know yet if the average player should be pinch run for.) But their average I.PR:player coefficient estimate is 0.26 with a standard error of 0.18. I guess I could take an even larger sample of good, slow hitters, but even with the big standard error, I'm convinced that this interaction estimate has a lot more to do with speed than hitting ability - the results from part 2 really did show that having a PR helps a lot. The estimate of the main effect of PR is also positive, but as I mentioned earlier, it's biased, so I won't draw any conclusions about the average players here (I did do a separate analysis without interaction, and hence without a noticeable selection bias, and the PR effect for the average player was still significant).

The table below is based on the average of the 25 aforementioned players. It gives the probabilities of winning in each of the 66 different situations (don't worry, I used a loop in R to make the html code so I didn't have to type it all). The situation column is as follows: lead, inning, base, outs. The p-values are for the 2-sided test between the two probability estimates.

situation	P(win) no PR	P(win) PR	p-value
-1,t8,1,0	0.325	0.379	0.217
-1,t8,1,1	0.232	0.277	0.226
-1,t8,1,2	0.145	0.177	0.236
-1,t8,2,0	0.399	0.457	0.210
-1,t8,2,1	0.290	0.341	0.221
-1,t8,2,2	0.156	0.190	0.235
-1,b8,1,0	0.442	0.501	0.206
-1,b8,1,1	0.332	0.386	0.217
-1,b8,1,2	0.218	0.261	0.228
-1,b8,2,0	0.521	0.580	0.200
-1,b8,2,1	0.401	0.459	0.210
-1,b8,2,2	0.233	0.278	0.227
-1,t9,1,0	0.216	0.259	0.227
-1,t9,1,1	0.148	0.180	0.234
-1,t9,1,2	0.089	0.110	0.240
-1,t9,2,0	0.275	0.325	0.222
-1,t9,2,1	0.190	0.229	0.230
-1,t9,2,2	0.096	0.118	0.240
-1,b9,1,0	0.296	0.348	0.219
-1,b9,1,1	0.209	0.251	0.228
-1,b9,1,2	0.129	0.158	0.236
-1,b9,2,0	0.367	0.423	0.213
-1,b9,2,1	0.263	0.311	0.223
-1,b9,2,2	0.139	0.170	0.236
0,t8,1,0	0.551	0.662	0.009
0,t8,1,1	0.468	0.583	0.011
0,t8,1,2	0.401	0.516	0.013
0,t8,2,0	0.628	0.729	0.007
0,t8,2,1	0.543	0.654	0.009
0,t8,2,2	0.421	0.537	0.013
0,b8,1,0	0.723	0.806	0.006
0,b8,1,1	0.652	0.749	0.007
0,b8,1,2	0.588	0.694	0.008
0,b8,2,0	0.782	0.851	0.005
0,b8,2,1	0.716	0.801	0.006
0,b8,2,2	0.608	0.712	0.008
0,t9,1,0	0.565	0.674	0.009
0,t9,1,1	0.482	0.597	0.011
0,t9,1,2	0.415	0.530	0.013
0,t9,2,0	0.641	0.740	0.007
0,t9,2,1	0.557	0.667	0.009
0,t9,2,2	0.435	0.551	0.012
0,b9,1,0	0.739	0.819	0.006
0,b9,1,1	0.670	0.764	0.007
0,b9,1,2	0.607	0.711	0.008
0,b9,2,0	0.796	0.861	0.005
0,b9,2,1	0.733	0.814	0.006
0,b9,2,2	0.627	0.728	0.007
1,t8,1,0	0.796	0.879	0.001
1,t8,1,1	0.742	0.843	0.001
1,t8,1,2	0.731	0.835	0.001
1,t8,2,0	0.843	0.909	0.001
1,t8,2,1	0.795	0.879	0.001
1,t8,2,2	0.747	0.846	0.001
1,b8,1,0	0.917	0.954	0.001
1,b8,1,1	0.891	0.938	0.001
1,b8,1,2	0.885	0.935	0.001
1,b8,2,0	0.939	0.966	0.001
1,b8,2,1	0.917	0.954	0.001
1,b8,2,2	0.894	0.940	0.001
1,t9,1,0	0.877	0.930	0.001
1,t9,1,1	0.840	0.907	0.001
1,t9,1,2	0.832	0.902	0.001
1,t9,2,0	0.907	0.948	0.001
1,t9,2,1	0.876	0.929	0.001
1,t9,2,2	0.843	0.909	0.001

A couple interesting things I notice:

the win probability estimates when trailing by one or when tied are higher with a runner on 1st and no out than with a runner on 2nd and one out. Newsflash: bunting is dumb in general, even when you only need one run.
some of the PR effects are shockingly large compared to what I had estimated in part 2 for the change in probability of the run scoring. I can only say that I double checked these estimates, and that the estimates from part 2 hadn't allowed for the slowness of the hitter. The other difference I can think of is that (for convenience) my code counted the runner as having scored even if he'd been erased by a fielder's choice and a subsequent runner scored. If this was more common in non-PR situations, it could have led to an understatement of the true PR effect in part 2.

In conclusion, I am convinced that pinch running in the late innings is an effective strategy. This was a bit surprising to me, but apparently the small chance of extra innings (and the new, poorer hitter being unable to contribute to a run that the good hitter could have) is outweighed by the relatively large improvement in run scoring in the current inning. The effect seems to get even bigger when the hitter was slow, even if he's a great hitter.

I think my next entry will be about late-inning defensive replacements.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Saturday, July 3, 2010

Are pinch runners effective? Part 2 of 3

For the introduction (part 1), see my previous post. In this entry I'm going to estimate the increase in the probability of scoring provided by a pinch runner (PR). My next post (part 3) will attempt to answer the ultimate question: is this increase in probability enough to justify weakening the offense for future innings.

After some R coding to count the necessary information, I fit a multiple logistic regression model with:

I.run: binary response, the indicator of whether the run scored,
I.PR: an indicator of whether there was a pinch runner,
lead: the score difference between the two teams (either -1, 0, or 1). Maybe somewhat surprisingly this makes a difference,
outs: the number of outs when the runner first got on base (this is categorical because there's no reason outs would be related linearly to the log odds of scoring),
I.2nd: an indicator of whether the runner was on 2nd base (the alternative is the reference level, 1st base),
player: identity of the original runner (for parsimony I'm lumping all pinch runners together as being fast guys),

while limiting the data as described in the introduction.

I'm making the assumption that there are only three outcomes stemming from possibly pinch running: the run won't score either way, it will score either way, or the PR would score when the original player would've been stranded. I'm ignoring the possible effects - that is, on subsequent runs in this inning that are less likely because of the extra out - of the slow runner being more susceptible to getting thrown out on the bases (by a force play or otherwise). I imagine the frequency of being thrown out when the PR wouldn't have been is low compared to the frequency of times this slow runner just gets held up by the 3rd base coach when the PR wouldn't have been. Regardless, it might get too complicated figuring out how much this extra out affects the distribution of run scoring in the inning. However if I fail to find that pinch running is beneficial (in part 3), I may have to revisit this assumption. I can think of two different ways to weaken it:

instead of the binary response, the response could be a count of the number of runs scored in the inning subsequent to this runner reaching base. I balked at that for this entry because in an entry last year, I had trouble getting R to do multinomial regression with covariance matrices provided - but that data set was bigger, so it may be worth a shot in my next post,
instead of separating steps 1 and 2, just fit a binary logistic regression with win/loss as the response. This gets at the question directly, but I'm worried that the response will be pretty noisy.

I considered the past 15 seasons (1995-2009), where a total of 720 different players satisfied the following two conditions: (i) they were both pinch run for and allowed to run for themselves in the 8th inning or later of a tie or one-run game, and (ii) they scored or had their PR score. The second condition removes about half a percent of the population left after the first condition, and I think it creates a minor selection bias - non-PR situations are being deleted more than PR situations (because there are many more non-PR situations), and the proportion of deletions corresponding to successes is probably not exactly the same as the proportion left in the sample. But without the second condition it would be impossible to estimate a player effect, so I required it anyway.

There are a number of interactions that could potentially enter the logistic model, but only two interactions were close to significant (for more on the I.PR:player interaction see the end of this post); they are shown below. I used the default treatment contrasts in R for I.PR, lead, outs, and I.2nd, but sum contrasts for the players, so the intercept corresponds to the log odds of scoring for the average player, running for himself, after a lead-off single when trailing by one run. The parameter estimates and their p-values are listed below; the residual deviance is 32836 on 27742 degrees of freedom, making the overdispersion parameter estimate less than 1.09, so I didn't bother allowing for overdispersion. It would not have changed any of the conclusions anyway. I've included David Ortiz's estimate below because he will come up again in a moment.

intercept	-0.430	2e-16 ***
I.PR	0.086	0.161
lead0	0.104	0.004 **
lead1	0.216	2.48e-07 ***
outs1	-0.661	2e-16 ***
outs2	-1.476	2e-16 ***
I.2nd	0.774	2e-16 ***
outs1:I.2nd	-0.207	0.011 *
outs2:I.2nd	-0.305	0.002 **
I.PR:lead0	0.161	0.046 *
I.PR:lead1	0.146	0.209
Ortiz	0.674	0.010 *

The coefficient estimates of interest are the ones concerning I.PR. The main effect for I.PR says that exp(0.086) is the multiplicative change in the odds of scoring created by pinch running when lead=-1, and similarly adding the main effect to each interaction term allows estimation of the multiplicative change in the odds when lead=0 or 1. I expect the more meaningful number in part 3 will be the additive change in probability - this will be different depending on the situation. The following table compares the probability of scoring a run with no PR to the probability of scoring a run with a PR in all 18 situations. This is done for an average player. A Taylor series expansion was used to estimate the variance of the difference of the probability estimates, and normality was assumed to calculate the p-values for the one-sided tests of those differences.

situation	P(run) no PR	P(run) PR	p-value
lead=-1,base=1,outs=0	0.394	0.415	0.081
lead=-1,base=1,outs=1	0.251	0.268	0.082
lead=-1,base=1,outs=2	0.129	0.139	0.084
lead=-1,base=2,outs=0	0.585	0.606	0.080
lead=-1,base=2,outs=1	0.372	0.392	0.082
lead=-1,base=2,outs=2	0.192	0.206	0.084
lead=0,base=1,outs=0	0.419	0.480	0.000
lead=0,base=1,outs=1	0.271	0.323	0.000
lead=0,base=1,outs=2	0.142	0.174	0.000
lead=0,base=2,outs=0	0.610	0.667	0.000
lead=0,base=2,outs=1	0.396	0.457	0.000
lead=0,base=2,outs=2	0.209	0.252	0.000
lead=1,base=1,outs=0	0.447	0.504	0.011
lead=1,base=1,outs=1	0.294	0.344	0.013
lead=1,base=1,outs=2	0.156	0.189	0.016
lead=1,base=2,outs=0	0.636	0.688	0.009
lead=1,base=2,outs=1	0.423	0.481	0.012
lead=1,base=2,outs=2	0.228	0.271	0.015

The following table is the same as the previous one, except it's for David Ortiz instead of the average player. He often hit prior to Manny Ramirez or Kevin Youkilis (good hitters).

situation	P(run) no PR	P(run) PR	p-value
lead=-1,base=1,outs=0	0.561	0.582	0.080
lead=-1,base=1,outs=1	0.397	0.418	0.082
lead=-1,base=1,outs=2	0.226	0.241	0.086
lead=-1,base=2,outs=0	0.735	0.751	0.082
lead=-1,base=2,outs=1	0.537	0.559	0.080
lead=-1,base=2,outs=2	0.318	0.337	0.083
lead=0,base=1,outs=0	0.586	0.644	0.000
lead=0,base=1,outs=1	0.422	0.483	0.000
lead=0,base=1,outs=2	0.244	0.293	0.000
lead=0,base=2,outs=0	0.754	0.797	0.000
lead=0,base=2,outs=1	0.563	0.623	0.000
lead=0,base=2,outs=2	0.341	0.398	0.000
lead=1,base=1,outs=0	0.613	0.666	0.010
lead=1,base=1,outs=1	0.450	0.507	0.011
lead=1,base=1,outs=2	0.266	0.313	0.016
lead=1,base=2,outs=0	0.774	0.812	0.012
lead=1,base=2,outs=1	0.590	0.645	0.010
lead=1,base=2,outs=2	0.366	0.422	0.013

The Ortiz coefficient's standard error was 0.263, so these probability estimates may be larger than the true probabilities: and surely they are because even Manny Ramirez cannot hit about .400, which is what the two-out numbers above are suggesting, with RISP in the long term.

We can see from the low p-values in both tables that pinch running helps increase the chance of scoring, at least when the leading or tied. The PR effect is actually much bigger than I imagined it would be. Other than the large probability increases from the 1st table to the 2nd, there are a couple of interesting things. First, as the I.PR*lead interaction suggested, the PR effect is much smaller when the team is trailing; the raw probability of scoring is also smaller when trailing. I think the latter is easily explained by the fact that the opposing team's best pitchers tend to be their set-up man and closer, and these are the pitchers you face when trailing by one run in the 8th or 9th inning. Another less likely explanation might be that if you are losing by one run, you are not hitting as well on the day, and hence less likely to score. This gets into the "hot hand effect" though, and so I think the likely explanation is the opposing pitching.

I don't think the smaller PR effect in games where the team is trailing is as intuitive, but it probably also has a lot to do with facing the opposing team's best pitching: the speed of your base runner is not important if the subsequent hitters are not getting hits. I actually fit a poisson regression with expected runs scored subsequent to the runner reaching base as the response, and interestingly the expected runs scored with the PR when trailing by one were lower than the expected runs with no PR. The effect was extremely small and not statistically significant, but with the size of my sample I wouldn't expect to see that. I wonder if it has to do with a different strategy being employed when trailing - the PR tries to steal 2nd base, or the manager tries to bunt him over to 2nd base. Either of these moves certainly decrease the expected runs scored in the inning (although possibly increasing the chance of scoring at least one run).

By not allowing for an interaction between I.PR and player, which would measure how fast the original base runner is, I've basically assumed that David Ortiz has the same speed as the average player (at least, average among those who have ever pinch run for - which is almost everybody at one time or another). But the variation in each of these interaction estimates would be even bigger than the variation we're now seeing in the player estimates, so I didn't bother with this here; I will have to include this is my final analysis in part 3 though, perhaps averaging several players together to more precisely estimate an overall effect of pinch running for good hitters. Anyway, because Ortiz is unusually slow, you can probably safely take away a little bit from Ortiz's probabilities and add a little bit to his PR's.

This wraps up step 1. I've had to make some simplifications and assumptions to fit this logistic model (some of which I might have to try to relax in my next post), but at least the data have allowed me to show that pinch running significantly increases the chance of scoring.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Tuesday, June 29, 2010

Are pinch runners effective? Part 1 of 3

Late in a close game when a slow runner gets on base, a manager will often remove him in favour of a pinch runner (PR), hoping to score a vital run. This can be done when the team is ahead and trying to get an insurance run, when the team is tied and trying to take the lead, or when the team is behind and trying to tie the game. I'm skeptical of this strategy because although the chances of scoring in the current inning are no doubt increased, this spot in the batting order may well come up again later, and the superior hitter who used to occupy it will no longer be in the game. Here's a cursory discussion that says it may work though: Tango analysis.

The optimal strategy will likely depend on a few things:

how many outs there are - perhaps with two outs, it won't be worthwhile to use a PR because the runner, regardless of his speed, is so unlikely to score, but with no outs it will be worth it,
whether the runner is on 1st base or 2nd base (pinch running tends not to happen at 3rd base),
the run difference between the two teams,
the inning,
the hitting ability of the player who reached base, and the hitting ability of the other players in his team's lineup as well as the other team's lineup.

I've decided to break the problem into two steps, which will be the subject of my upcoming entries. This post will just serve as the introduction. The first step will be to estimate how much more likely the run is to score when the PR enters the game. The second step, which will be more complicated, will address the runs that are lost in future innings due to the decrease in the lineup's hitting ability. It's not yet clear to me what exactly must be estimated in step 2 in order to solve the overarching question. It will involve the probability of this spot in the order coming to bat again, which will of course depend on the current inning and score, and the estimation in this step may have to happen jointly with step 1 in order to get a sensible variance estimate.

To deal with step 1, I don't want to just blindly compare pinch running situations to situations with no PR - the lack of PR may indicate that the player who reached base is fast himself, and this will mitigate any true PR effect. I want to limit the sample to players who are sometimes pinch run for and sometimes not, and then control for player (and hence implicitly his team) in the model.

I'm going to limit the data to close games (within one run) in the late innings (8th or later). A gross PR effect (step 1) could be estimated without these restrictions (although pinch running in other situations may have more to do with resting a player or replacing an injured player), but the net effect that needs to be estimated in step 2 involves the manager's strategic decision to remove the man on base, believing either that this spot in the order is unlikely to reach the plate again, or that the immediacy of the potentially ensuing run takes precedence. I also want to limit the situations to PRs (or no PRs) when only one man is on base; this will keep the number of possible situations low - either there is a runner on 1st and nowhere else, or there is a runner on 2nd and nowhere else.

One other thing to note is that I'm not dealing with pitchers here. Deciding whether to pinch run for the pitcher is a completely different scenario, although it probably doesn't happen that much anyway because he'd sooner be pinch hit for.

Fortunately having previously dealt with the Retrosheet play-by-play data needed to set up the model, the parsing of data shouldn't be too time consuming.

Stay tuned for part 2!