Statistics in Baseball: October 2010

Tuesday, October 19, 2010

Big changes!

I've got a great new job, starting in January, for a sports betting company in London. This means that the blog entries will probably stop for good because my baseball modelling efforts will go into the job. I may still post once in a while about other sports that my employers don't bet on - like maybe curling!

Monday, October 11, 2010

The Home Run Derby effect?

Do players who participate in the All-Star Game Home Run Derby screw up their swing, go into a slump, and have a poor second half? I've heard this one from the talking heads before, and it sounds completely false to me. Mark McGwire used to put on a show in batting practice in 1998, and it didn't stop him from hitting 70 home runs.

The Home Run Derby data were found from the MLB website, but they weren't in CSV or tab separated format, so I had to do some manipulation. I would have liked to get data on just the few games following the HR Derby to check for slumps, but I settled for the first and second half splits (as determined by the All-Star break) from Baseball-Reference. I used the years 2003-2009.

OPS is a good measure of how effective a hitter is, so I thought it would be best to compare pre- and post-break numbers in terms of OPS (I tried some other things and they led to similar conclusions). Baseball-Reference has a statistic called sOPS+ which measures a player's OPS relative to the league. This controls for season, but since I was looking at first and second half differences, this wasn't too important. I tried it anyway, and it gave almost the exact same results as OPS, so I stuck with OPS.

Although some players appeared in more than one HR Derby between 2003 and 2009, I assumed that the 56 differences between pre-break OPS and post-break OPS were independent. The differences looked Gaussian, and the average pre-break OPS was .958 and the average post-break OPS was .924, leading to a one-sided p-value of 0.02 in the paired t-test - so it's true, the participants do worse after the break! This idea is furthered by the fact that the mean career pre- and post-break OPS for the players are not significantly different - the decrease seems to happen specifically in the year the players compete in the HR Derby.

But... how are hitters selected for the HR Derby? By having a very good first half. The hitters participating are ones who have often done unusually well in the first half, and were heading for a drop-off in the second half whether they took part in the derby or not. I can think of two ways to get around this and answer the question of whether the HR Derby causes the poorer second half. One is to compare the second half OPS in HR Derby years to the second half OPS in non-HR Derby years, and the other is to see if players who take part in more rounds of the derby have a bigger second half drop-off than players who are eliminated early. (I could also look at the second half drop-off for players in the All-Star Game who weren't in the HR Derby, but it was enough of a pain getting the data for just these players, so I'll try to avoid this approach.)

The mean career second half OPS of the 56 HR Derby hitters is .894, and in HR Derby years it is .924. This is still a bit unsatisfactory because the HR Derby year is presumably in the prime of their career, so let's try the second way. Consider the number of swings taken in the competition by each player; this is equal to ten times the number of rounds they were in, plus their HR total. Fitting a linear regression of decrease in OPS on number of HR Derby swings, it is apparent that the more swings the player takes, the less the pre- and post-break OPS difference is, i.e. the opposite of the proposed effect. So I'm pretty comfortable writing the poorer second halves off to selection bias.

Thanks to Bret Hanlon for the idea for this post.

Sunday, October 10, 2010

Starting pitchers in their last inning of the game

In my previous post I compared the FIP for starting pitchers in their final inning of the game to the FIP of their team's bullpen, and found that they're being left in the game too long. Today I'm just going to present three tables that I created from the R code used to make that post. I limited the data to the AL in 2009 in the previous post because the bullpen data had to be collated manually, but no such barrier exists this time, so I used both the AL and NL from 2002-2009, limiting the data to pitchers who started at least 50 games over that time.

This first table looks at pitchers who had the most significant drop-off (all of them had p-values less than 10^(-6)) from their non-final inning numbers to their final inning numbers. 233 of the 241 of the pitchers in the sample had a significant drop-off (p-values less than 0.05), but these were the most severe. I don't know how much this difference really means because the important number for managerial decisions is the late FIP, but many of these guys are good pitchers, and are probably pitching in close games pretty often (unlike somebody who has a really high FIP throughout the game), and their team would benefit if the manager got them out one inning early. Of course knowing when the pitcher is going to start getting knocked around in a given game is impossible - sometimes they might be getting pulled in the 8th, other times in the 7th, etc. However it seems managers should be extra aware about these pitchers, and get them out of the game as soon as they're showing even a slight decrease in velocity.

pitcher	early FIP	late FIP
Burnett, A.J.	3.351	5.907
Byrd, Paul	4.182	7.196
Fogg, Josh	4.529	8.090
Garland, John	4.210	7.301
Hernandez, Livan	4.039	6.670
Lackey, John	3.538	5.755
Lohse, Kyle	4.332	7.152
Meche, Gil	3.940	6.950
Ortiz, Russ	4.183	7.639
Pavano, Carl	3.536	7.049
Perez, Oliver	4.176	7.506
Robertson, Nate	4.160	7.473
Santana, Johan	2.915	5.337
Silva, Carlos	4.143	7.384
Trachsel, Steve	4.444	7.597
Wakefield, Tim	4.149	7.193

This table is less interesting, but these were the only eight pitchers who didn't have a significant last inning FIP increase.

pitcher	early FIP	late FIP	p-value
Williams, David	5.131	6.063	0.187
Smoltz, John	3.114	3.827	0.089
Litsch, Jesse	4.510	5.820	0.085
Ryan, Brendan	4.148	5.208	0.073
Kuroda, Hiroki	3.400	4.561	0.071
Santos, Victor	4.556	5.713	0.066
Galarraga, Armando	4.760	6.435	0.056
Hammel, Jason	3.976	5.410	0.050

The final table shows the nine pitchers who had a last inning FIP over 9.00. I don't think any of them are still starting games. Rick Reed had a good career and my data set just caught the tail end of it. Most have been tried as relief pitchers, and Darren Oliver has actually become a pretty good one.

player	early FIP	late FIP
James, Chuck	4.240	10.827
McClung, Seth	4.669	10.280
Oliver, Darren	5.053	9.661
Kinney, Matt	4.037	9.640
Waechter, Doug	4.830	9.609
Reed, Rick	3.531	9.534
Helling, Rick	4.162	9.433
Owings, Micah	4.872	9.243
Mays, Joe	4.595	9.032

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".