Tuesday, June 29, 2010

Are pinch runners effective? Part 1 of 3

Late in a close game when a slow runner gets on base, a manager will often remove him in favour of a pinch runner (PR), hoping to score a vital run. This can be done when the team is ahead and trying to get an insurance run, when the team is tied and trying to take the lead, or when the team is behind and trying to tie the game. I'm skeptical of this strategy because although the chances of scoring in the current inning are no doubt increased, this spot in the batting order may well come up again later, and the superior hitter who used to occupy it will no longer be in the game. Here's a cursory discussion that says it may work though: Tango analysis.

The optimal strategy will likely depend on a few things:
  • how many outs there are - perhaps with two outs, it won't be worthwhile to use a PR because the runner, regardless of his speed, is so unlikely to score, but with no outs it will be worth it,
  • whether the runner is on 1st base or 2nd base (pinch running tends not to happen at 3rd base),
  • the run difference between the two teams,
  • the inning,
  • the hitting ability of the player who reached base, and the hitting ability of the other players in his team's lineup as well as the other team's lineup.
I've decided to break the problem into two steps, which will be the subject of my upcoming entries. This post will just serve as the introduction. The first step will be to estimate how much more likely the run is to score when the PR enters the game. The second step, which will be more complicated, will address the runs that are lost in future innings due to the decrease in the lineup's hitting ability. It's not yet clear to me what exactly must be estimated in step 2 in order to solve the overarching question. It will involve the probability of this spot in the order coming to bat again, which will of course depend on the current inning and score, and the estimation in this step may have to happen jointly with step 1 in order to get a sensible variance estimate.

To deal with step 1, I don't want to just blindly compare pinch running situations to situations with no PR - the lack of PR may indicate that the player who reached base is fast himself, and this will mitigate any true PR effect. I want to limit the sample to players who are sometimes pinch run for and sometimes not, and then control for player (and hence implicitly his team) in the model.

I'm going to limit the data to close games (within one run) in the late innings (8th or later). A gross PR effect (step 1) could be estimated without these restrictions (although pinch running in other situations may have more to do with resting a player or replacing an injured player), but the net effect that needs to be estimated in step 2 involves the manager's strategic decision to remove the man on base, believing either that this spot in the order is unlikely to reach the plate again, or that the immediacy of the potentially ensuing run takes precedence. I also want to limit the situations to PRs (or no PRs) when only one man is on base; this will keep the number of possible situations low - either there is a runner on 1st and nowhere else, or there is a runner on 2nd and nowhere else.

One other thing to note is that I'm not dealing with pitchers here. Deciding whether to pinch run for the pitcher is a completely different scenario, although it probably doesn't happen that much anyway because he'd sooner be pinch hit for.

Fortunately having previously dealt with the Retrosheet play-by-play data needed to set up the model, the parsing of data shouldn't be too time consuming.

Stay tuned for part 2!