I've seen a couple articles (here and here) that have approached this problem in an heuristic way and only for one closer. I picked the first 6 closers off the top of my head - Trevor Hoffman, Bobby Jenks, Joe Nathan, Jonathan Papelbon, Mariano Rivera, and Billy Wagner - and I'll do some inference to see if any of them pitch significantly better when in save situations. I've limited the analyses to seasons when they were full-time closers.

The incredible Baseball-Reference has split stats for every pitcher for every year, and this includes all major performance indicators in save situations and in non-save situations, so getting the data into the right format was easy. The following graphs show the FIP and ERA for the 6 closers with the red indicating save situations and the blue indicating non-save situations. I threw out a partial season for both Hoffman and Wagner (mostly to make the graphs look nicer).

I thought if there was a difference in performance between the two situations that there could be a time trend - perhaps the veteran closer has learned how to perform in the non-save situations better than the raw, young closer. But these graphs show no obvious interaction between time and situation for any closer, so to perform my statistical analyses I aggregated the stats across all years.

For each player/situation combination I got estimates for the probability of a home run (HR), walk or hit by pitch (BB), strikeout (K), and of another type of out (out). Dividing the numerator and denominator of the usual FIP formula by PA, we see that FIP = 3(13P(HR) + 3P(BB) - 2P(K))/(P(K)+P(out)) + 3.2. And since it's written as a function of the probabilities and we know the covariance of multinomial probabilities, the delta method can be applied to get the variance of each FIP estimate.

The following table shows the FIP estimates for save situation and non-save situation, and the p-value - assuming normality of the estimates - for the test of FIP(SV) less than FIP(NSV).

Pitcher | FIP(SV) | FIP(NSV) | p-value |
---|---|---|---|

Hoffman | 2.83 | 3.30 | 0.08 |

Jenks | 3.31 | 3.57 | 0.33 |

Nathan | 2.67 | 2.14 | 0.90 |

Papelbon | 2.51 | 2.34 | 0.63 |

Rivera | 2.71 | 3.06 | 0.12 |

Wagner | 2.75 | 2.63 | 0.64 |

We can see that three pitchers actually pitch better in non-save situations and even without controlling for multiple testing, no pitcher is significantly better in save situations. That doesn't mean that this will hold for every closer (although I suspect all significant differences would be false positives), but for these 6 the myth has been broken!

The data used here were obtained from Baseball-Reference.