Saturday, March 24, 2018

Optimizing Lineups In the Most Painful Way Imaginable

Aaron Judge batting leadoff? Brandon Drury batting cleanup? What is this guy smoking?

If you haven't read Marc Carig's excellent piece on the Yankees' attempts to optimize their lineups, you can read it with an Athletic subscription here. If you don't have an Athletic subscription (first of all, wyd?), here are the lineups that I generated for Carig as most optimized and least optimized against lefties below.


How did I come to these lineups? How good are they against lefties? In this article, I'll detail the exact methodology for generating these lineups along with some figures.

The first step is to identify the run environment that our lineup will be playing in. Run environments generally are fairly consistent from season to season, but given the recent and dramatic trend upwards in run scoring, I used the 2017 AL run environment for this work.

I pulled MLB play-by-play data from 2017 AL teams at home (with the DH in play) and generating an RE24 matrix for that season for AL teams.

0 out1 out2 out
___0.520.290.11
1__0.890.540.23
_2_1.120.690.32
12_1.360.940.38
__31.490.940.44
1_31.731.190.50
_231.951.350.59
1232.321.60.71

I then used a similar methodology to the one used by Tango, Lichtman, and Dolphin in Chapter 5 of The Book to break down the run values of different plate outcomes (generic out, strikeout, walk, single, double, etc.) depending on where in the lineup each occurred.

Batting SlotGeneric OutStrikeoutWalkHit By PitchSingleDoubleTripleHome Run
1-0.27-0.260.310.330.420.711.151.30
2-0.27-0.270.320.310.450.700.961.36
3-0.27-0.260.290.330.440.741.101.40
4-0.29-0.280.300.320.460.771.101.42
5-0.29-0.290.300.340.460.781.041.41
6-0.28-0.270.310.330.430.721.111.40
7-0.28-0.270.300.330.450.710.991.43
8-0.28-0.280.310.310.450.730.971.43
9-0.27-0.280.310.360.440.741.041.32

We can check our results intuitively. For example, since lead-off hitters come to the plate most frequently without runners on, an out (strikeout or generic) is least harmful for lead-off hitters compared to other lineup spots, because it's not stranding runners. Meanwhile, since cleanup hitters come to the plate most frequently with runners on, home runs are more valuable for them than they are to any other hitters.

A final step would be to look at how frequently each lineup order comes to the plate in the course of a season. A player batting in a particular batting slot received a particular number of plate appearances on average, as shown below.

Batting OrderPA
1757
2738
3720
4705
5687
6670
7652
8633
9613

Now, we grab our projections. For the Yankees article, I got the Yankees' projected values against LHP from Steamer (thanks, Eno and Jared!) and broke their projections down into Per-PA rates for each plate outcome (BB/PA, HR/PA, etc.). Then, I calculated the Runs Above Average/PA value for each player for each lineup slot (Gary Sanchez is worth .0349 RAA/PA batting second, but only .0293 RAA/PA batting leadoff).

Finally, I generated potential defensive configurations for the Yankees and then generated all possible permutations of the lineups (all 362,880 of them). Then, using the expected PA over the course of a season, coupled with each player's projected RAA/PA for their spot, I calculated the RAA value for each lineup over the course of a season.

The result? Optimized lineups based on these Steamer projections. Here are the best and worst lineup configurations, with their run values alongside them.

Lineup OrderBest OrderRAAWorst OrderRAA
1Aaron Judge30.39Didi Gregorius-29.43
2Giancarlo Stanton73.17Tyler Wade-40.83
3Gary Sanchez26.9Aaron Hicks-5.69
4Brandon Drury-4.95Brett Gardner-22.89
5Aaron Hicks-10.75Gregory Bird-5.97
6Gregory Bird-5.92Brandon Drury-7.44
7Brett Gardner-21.39Gary Sanchez20.79
8Didi Gregorius-25.47Aaron Judge26.23
9Tyler Wade-35.60Giancarlo Stanton56.72
Net RAA Value26.38Net RAA Value-8.51

So, Judge is best used at the top of the order because his strikeouts hurt less and his walks help more. Stanton bats 2nd because as the projected best hitter in the lineup against LHP, he gains the most in terms of additional PA and baserunners ahead of him.

The Net RAA Value represents how many runs the Yankees would score with a given lineup over the course of a full season. So if the Yankees were to face an LHP for 162 games and only use the best lineup, they would score 26.38 more runs than an average 2017 AL lineup, or 789 runs. If they used the worst lineup, they would score 8.51 runs less, or 754.

Considering that the Yankees don't face LHP over the entire season, the advantage is even less. Teams face LHP about 20-25% of the time, so the advantage of rolling out the optimized lineup over the least optimized lineup is worth only about 7-8 RAA over the course of the season. And considering that a team would never run out a lineup like the worst lineup shown, the true advantage is only something like 3-4 RAA. Almost insignificant. But - as Carig says, "the Yankees refuse to settle" - they're not willing to let advantages like this pass them by.

No comments:

Post a Comment