The optimal strategy will likely depend on a few things:
- how many outs there are - perhaps with two outs, it won't be worthwhile to use a PR because the runner, regardless of his speed, is so unlikely to score, but with no outs it will be worth it,
- whether the runner is on 1st base or 2nd base (pinch running tends not to happen at 3rd base),
- the run difference between the two teams,
- the inning,
- the hitting ability of the player who reached base, and the hitting ability of the other players in his team's lineup as well as the other team's lineup.
To deal with step 1, I don't want to just blindly compare pinch running situations to situations with no PR - the lack of PR may indicate that the player who reached base is fast himself, and this will mitigate any true PR effect. I want to limit the sample to players who are sometimes pinch run for and sometimes not, and then control for player (and hence implicitly his team) in the model.
I'm going to limit the data to close games (within one run) in the late innings (8th or later). A gross PR effect (step 1) could be estimated without these restrictions (although pinch running in other situations may have more to do with resting a player or replacing an injured player), but the net effect that needs to be estimated in step 2 involves the manager's strategic decision to remove the man on base, believing either that this spot in the order is unlikely to reach the plate again, or that the immediacy of the potentially ensuing run takes precedence. I also want to limit the situations to PRs (or no PRs) when only one man is on base; this will keep the number of possible situations low - either there is a runner on 1st and nowhere else, or there is a runner on 2nd and nowhere else.
One other thing to note is that I'm not dealing with pitchers here. Deciding whether to pinch run for the pitcher is a completely different scenario, although it probably doesn't happen that much anyway because he'd sooner be pinch hit for.
Fortunately having previously dealt with the Retrosheet play-by-play data needed to set up the model, the parsing of data shouldn't be too time consuming.
Stay tuned for part 2!