Sunday, February 17, 2019

Overwatch League Fantasy: Should you start players in sweeps?

In fantasy football, team match-ups play a huge role in deciding when to start and bench players - fantasy owners usually starting wide-receivers against the Jets (who allowed an average of 28.5 fantasy points per game to WR in 2018), but tended to sit anyone who wasn't a superstar against the Jaguars (who allowed just 16.8 points per game to WR in 2018, best in the NFL). Overwatch Fantasy is little different - fantasy owners should pay close attention to who teams are playing against - team strength can play a huge role in your fantasy points.

Should you start a player even when you think their team will get curb-stomped? How about when you think your team will do the curb-stomping? What about betting on a crucial fifth map, where players can boost their play time with some extra minutes? Here's your guide to figuring out who/when to start in the Overwatch League based on how you think team units will play.

A brief note on terminology
  • The word "Map" is used to refer to a single discrete instance of team competition within an Overwatch League game, or "Match". An example of a map might be Busan, Dorado, or Route 66.
  • The word "Match" is used to refer to multiple maps which are played in a game. Teams generally play four maps in a single match but may play a fifth map should the teams be tied after playing four maps.

Part One: Establishing Baselines

To determine when you should be starting and sitting players in matchups, we'll first need to establish baselines for fantasy play. We'll use OWL Stats from 2018's regular season with HighNoon.GG's fantasy scoring system. It is of note that these stats represent fantasy stats from a 2/2/2 meta as opposed to a GOATS meta, but much of the same principles hold true as we are generally looking at the view from 20,000 feet as opposed to breaking these stats down by hero-choice or role.

We'll first calculate the average number of fantasy points accumulated per game. In Overwatch League stage play in 2018, the league recorded collectively the following values:

Total Damage      Total Elims   Total Heals
130,285,657.71212,44048,777,512.61

HighNoon.GG's scoring system uses the following scoring system based off of these metrics:
  • 1 point per 1,000 damage recorded
  • 1 point per 1,000 healing recorded
  • 0.5 points per elimination recorded
The breakdown of fantasy points recorded leaguewide in each category is as follows:

Damage Fantasy Points    Eliminations Fantasy Points      Healing Fantasy Points
130285.66106220.0048777.51

Thus, a total of 285,283.17 fantasy points were recorded by OWL players in 2018. The Overwatch League had 250 matches during stage play and a total of 12 players were fielded by both teams at any given time - thus, the average fantasy points per player-slot per match was 95.09 fantasy points. Thus, 95.09 is our total baseline for comparison. The breakdown of average points per category per match per player-slot is as follows:

Damage Fantasy Points     Eliminations Fantasy Points      Healing Fantasy Points
43.4335.4116.26

Part Two: Winners and Losers

One of the considerations in terms of starting/sitting might be who is expected to win and lose a game. In general, winning teams record an average of 101.76 fantasy points per match per role slot, and losing teams recorded 88.42 fantasy points per match per role slot. This revelation is patently obvious - fantasy points generally measure positive objectives, and a team will reach these objectives frequently en route to winning.

However, it is of note that this differential comes almost entirely from eliminations.
Damage Fantasy Points    Eliminations Fantasy Points   Healing Fantasy Points
Average43.4335.4116.26
Losing Teams42.0430.1916.19
Winning Teams44.8240.6216.33

There is almost a ten-point spread in eliminations between winning and losing teams, but only a two-point spread in damage and less than 0.2-point spread in healing. This result suggests that main-supports whose value largely comes from healing, such as Unkoe, Gido, and Revenge, do not necessarily need to have the outcome of the match factored into the decision to start or sit those players.

A quick check of last year's fantasy point totals confirms this. The following players had the smallest differential in fantasy point totals between wins and losses among players with at least 300 minutes played in both wins and losses, with % of Points from healing representing that player's rate stat across both wins and losses:
Player   Points/Game in Wins   Points/Game in Losses   Differential   % of Points from Healing
Closer47.0055.33-8.3377.2%
Mistakes89.5695.13-5.565.1%
Bani45.0950.63-5.5391.1%
sinatraa81.2785.40-4.134.6%
Libero89.9293.75-3.831.4%
Kellex66.3767.53-1.1682.2%
Hydration79.0080.13-1.135.2%
Moth62.0862.080.0081.9%
Gesture86.6486.120.520.2%
Coolmatt89.7789.160.610.1%

And the following players had the largest differentials:
Player   Points/Game in Wins   Points/Game in Losses   Differential   % of Points from Eliminations
ShaDowBurn110.0073.7836.2244.5%
Eqo112.1979.3332.8540.4%
Asher79.9347.4032.5348.7%
Seagull115.9085.6730.2339.1%
Agilities100.6073.6226.9837.9%
Envy130.09103.3326.7647.8%
Carpe112.8688.2524.6147.1%
NotE117.2392.8324.4044.5%
Boombox132.54108.6923.8525.8%
FLETA105.0581.5023.5541.3%

Again, it is not unreasonable to expect players to perform poorly against better opponents, but this information confirms that it is more difficult for players - especially elim-heavy fantasy players - to rack up more fantasy points in losses than wins.

This information indicates that expected match outcome is a non-factor in determining whether or not to start a main-support player, yet it may be worthwhile to bench a DPS player who relies on eliminations for points should you anticipate that they may be walking into a potential loss for an inferior DPS player playing for a team who expects to win.

Part Three: Expected Map Differential and Fantasy Points

There are a number of different lines of thinking in starting fantasy players when taking into team strength into consideration. We will define three different rationalizations, and then objectively examine them. Please do not read too much into my characterizations of each team - the point is not how I evaluate each team, rather, they are names ascribed to examples of teams of fictional strength.

Example A: Curb Stomping
A fantasy owner owns Meko, a player for the notable powerhouse NYXL. NYXL's only match this week is against the Florida Mayhem, a fairly poor team that NYXL is expected to sweep with ease. The fantasy manager starts Meko on the grounds that Meko will pick up many points in an easy victory over an inferior Mayhem team. However, should this manager consider that these games may be over more quickly, thus robbing Meko of the chance to pick up more fantasy points?

Example B: Getting Curb-Stomped
A fantasy owner owns Geguri, a player for the fairly weak Shanghai Dragons. The Dragons' only match this week is against the Philadelphia Fusion, the runner-ups from the Overwatch League championships in 2018 and a very strong team this season, and are overwhelmingly favored. The fantasy manager starts Geguri on the basis that she is an excellent flex-tank. However, will Geguri's production struggle given that she is playing against a superior team and that, in a sweep, the games may be over more quickly?

Example C: The Even Match
A fantasy owner owns Shadowburn, a player for the middle-of-the-road Paris Eternal. The Eternal play the Atlanta Reign in their only match this week, and it is expected to be a close and tight game. Despite the fact that Shadowburn may be receiving a healthy degree of competition, the owner starts Shadowburn on the basis that the games will be long and drawn out, thus giving Shadowburn more time to accumulate fantasy points.

Which of these lines of thinking are logical? Let's examine how many fantasy points players in different map spreads tend to receive.

There are a number of map-differentials that teams might encounter. A clean sweep represents a 4-0, a somewhat closer match would result in a 3-1 win, and a tied match after four maps means that one team will be walking away with a 3-2 win. There is also the possibility for draws, meaning that 3-0 and 2-1 outcomes are possible.

From 2018, here are the average points per match per player slot for teams in these different situations:

Map Differential   Outcome   Points/Game
3 to 2Winner115.30
2 to 1Winner113.13
3 to 2Loser109.09
3 to 0Winner106.53
2 to 1Loser101.76
3 to 1Winner95.07
4 to 0Winner94.57
3 to 0Loser93.92
3 to 1Loser83.56
4 to 0Loser72.99
In general, 3-2 matches tend to be the most productive in terms of fantasy outcomes for both teams - indicating that scenario C represents the greatest potential for fantasy points in a vacuum. It is also apparent that 3-2 wins and losses are dragging the averages for wins and losses overall upwards.

Starting a player in a game where the team is expected to win 3-1 or 4-0 represents an average fantasy opportunity, with both values appreciably close to the overall average for fantasy points per game per player slot. However, starting a player in a game that they might be expected to lose 4-0 means that they might stand to finish in excess of twenty points below average - a significant handicap.

However, evenly matched games present the greatest fantasy opportunity - a match which goes to a fifth map represents an opportunity for about fifteen additional points for players on both sides. These kinds of match-ups should be targeted - given two players of identical caliber, the correct play is to start the player in the match that would be more evenly matched.

What about the risk of losing a 3-1? The 3-1 loss column likely exaggerates the actual difference in terms of the fantasy penalty for a team losing 3-1 to an evenly matched opponent, as the 3-1 loss category includes many more teams who faced off against a much stronger opponent yet managed to take a map off of them, but even if we assume the penalty to be consistent across all teams as a worst-case scenario, the expected value in terms of fantasy points relative to the league average is as follows for a game between two evenly matched teams (such that P(Team A wins) = P(Team B wins) = 0.50):

Team A loses 0-4   Team A loses 3-1   Team A loses 3-2   Team A wins 3-2   Team A wins 3-1   Team A wins 4-0   Expected Value
Odds of Map Differential6%25%19%19%25%6%100%
Fantasy Points Above Average-22.10-11.5314.0020.21-0.20-0.52N/A
Expected Value-1.38-2.882.633.79-0.05-0.032.07

By a back-of-the-napkin calculation, it certainly appears as though starting players in evenly matched games is worth the risk of the 3-1 or 4-0 as our expected gain in terms of average points is positive (+2.07).

Why might players stand to gain so much from playing in close 3-2 matches? It certainly appears to be match-time. 3-2 matches do, by virtue of that fifth map, record significantly more play-time than other match differentials.
Map Differential   Average Match Length (Min)
3 to 263.63
2 to 158.13
3 to 055.92
3 to 149.93
4 to 047.33
However, we should not discount the possibility of the strength of competition driving point totals as well. In terms of rate stats, it appears as though evenly matched teams post average point differentials against each other, whereas teams curb-stomping opponents generate a high degree of points-per-ten minutes (league average of 17.79 points per 10 minutes).
Map Differential   Outcome   Fantasy Points Per Slot Per 10 Min
4 to 0Winner20.03
2 to 1Winner19.47
3 to 0Winner19.05
3 to 1Winner19.04
3 to 2Winner18.12
2 to 1Loser17.51
3 to 2Loser17.15
3 to 0Loser16.80
3 to 1Loser16.76
4 to 0Loser15.42
Yet, as demonstrated above, the difference in rates of accumulation does not compensate for the brevity of four-map games.

It is also of note that games with drawn-maps also tend to display longer times and similar points-per-ten - indicating that these games are quite close as well. However, map draws are fairly rare and unpredictable enough that I have felt comfortable not including them in the larger discussion of this analysis.

Conclusion: Notes on Synergy and Impact

There is a natural question of, "The chicken or the egg": do fantasy teams truly post high totals by virtue of winning, or do they simply accumulate these high totals en route to winning, and we are mistaking the disease for the symptom? It is obvious that evenly matched teams present an opportunity for additional points by virtue of the fifth map, yet teams who win tend to simply have better players overall, and this is what is ultimately measured by fantasy points, not simply wins.

The answer is, it is probably both the chicken and the egg. In baseball, it is rather easy for a good player to have an excellent performance in a losing effort - like Mike Trout going 2-3 with two doubles and a walk in an 8-2 loss - but it is more difficult to accomplish that feat in Overwatch, especially in a GOATs meta where getting the first kill tends to result in the rest of the team dying or running away. It is obvious that winning is a function of player skill, yet it is also the function of multiple players' skills, and the other players on the team ultimately affect each others' fantasy point totals. Janus was not an awful fantasy player with NYXL, but watching him falter against his former teammates while playing on a much weaker team on Saturday was a reminder that team strength plays an important role in fantasy, as does the quality of opponent. Both of these factors are factored into the discussion of winning/losing and map-differential.

In that respect, consider these statistics to be overstated, but only to a degree. Yes, good fantasy players play for good teams, and teams tend to put up more points in fantasy wins. But at the same time, expectations regarding winning/losing can help predict fantasy stats. And by recognizing the potential for a 3-2 match, you might pick up some bonus points with ease.

Wednesday, October 31, 2018

Introducing rFIP and nFIP

One of the most valuable tools in all of baseball is pitch framing: the idea of stealing strikes for pitchers is severely underrated by metrics such as fWAR and rWAR, and from our few measurements of it, we know that pitch framing from an individual catcher can be worth upwards of 20-30 runs over the course of a full season. Metrics like WARP assign these values to catcher, and while DRA adjusts pitcher performance to a degree to account for factors like pitch framing, it's difficult to truly grasp the impact of pitch framing on a pitcher's performance. This is where rFIP and nFIP come in.

Background, rFIP

It's intuitive that pitchers that throw a lot of strikes will register a lot of strikes, and pitchers that throw a lot of balls will register a lot of walks. In this sense, the relationship between the ratios of strikes per ball and strikeouts per walk is quite strong. On a team by team level, the ratio of strikes/ball to strikeouts/walk recorded an R2 value of 0.9253 in 2018, indicating an extremely strong relationship between the two variables.


Since 2015, strikes/ball and strikeouts/walk on a pitcher-by-pitcher level recorded a strong relationship as well among qualified pitchers, albeit to a lesser degree (R2 value of 0.6445).

Despite having a slightly less strong relationship than the team-by-team relationship, both linear regressions reveal the same formula for approximating strikeouts per walk from strikes per ball:




In other words, a pitchers' strikeouts per walk can be approximated reasonably well from the number of strikes and balls that they record.

Knowing that a pitchers' strikes and walks are affected by pitch framing, we might seek to remove the influence of the catcher and umpire on strike and ball calls by looking solely at the number of pitches a pitcher throws in the zone (which are technically strikes according to MLB rules, but via mistakes from umpires and pitch framing efforts, might be called as balls) and the number of pitches a pitcher throws out of the zone (which are technically balls but may end up as strikes thanks to pitch framing). We can seek to contextualize this value by calculating a pitcher's fielding independent pitching value (FIP) by calculating a pitcher's strikeout to walk ratio given their balls and strikes as called by a robotic umpire (with Statcast serving as our robot) - this leads us to rFIP, robotic-FIP.

We must operate with multiple caveats in calculating rFIP. The primary caveat is that we are assuming a two-dimensional strike zone as Statcast does: Statcast defines the strike zone as the imaginary plane that runs perpendicular to the ground and parallel to the front edge of the plate, as shown in an illustration from MLB's officially rulebook below.




However, baseball's strike zone is not two dimensional, but three dimensional. According to major league baseball's official rules, "The STRIKE ZONE is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap". Thus, the strike zone is not a plane, but a prism. A more accurate representation of the strike zone is shown below from Wikipedia.



Our approximation of the strike zone will miss some borderline pitches that do, in fact, cross through the strike zone but do not touch the front plane. Still, the number of strikes and balls that our methodology misses will be quite small, as it is quite difficult to throw a pitch that is a strike while avoiding that front plane.

Our methodology also assumes that a pitcher would record the same sum of strikeouts and walks regardless of the ratio of strikeouts to walks that they recorded. In other words, a pitcher might expect to record the same number of balls in play, HBP, and HR in a season regardless of how many walks/strikeouts they yield, so this not a completely unfair assumption.

To demonstrate how rFIP is calculated strictly, I will carry through an example using 2018 Mets starter Jacob deGrom, who had the greatest single season pitching performance of the past four seasons by rFIP.

I went into Statcast's database and pulled deGrom's strikeouts, walks, hit-by-pitches, home runs, and IP outs to calculate deGrom's FIP. deGrom recorded 268 K, 43 BB, 5 HBP, 10 HR, and 643 IP outs in 2018 according to Statcast. Note that these values are slightly off from deGrom's actual totals - 269 K, 46 BB, 5 HBP, 10 HR, 651 IP outs - because Statcast has some missing values. Still, the difference between our calculated FIP for deGrom - 1.94 FIP - is only marginally different from deGrom's actual FIP (1.99 FIP). This is the baseline FIP for deGrom.

I then pulled deGrom's total strikes and total walks, then I measured how many strikes were "stolen strikes" - that is to say, called strikes recorded as outside of the strike zone - and how many balls were "lost strikes" - called balls recorded as inside the strike zone. In 2018, deGrom recorded 1698 strikes, 80 of which were stolen strikes, and 999 balls, 54 of which were lost strikes. To calculate deGrom's strike total as called by a robotic umpire, I subtracted the number of stolen strikes from deGrom's total number of strikes and then added his lost strike total to that figure. deGrom's ball total as called by a robotic umpire was equivalent to the number of balls deGrom recorded minus his lost strike total plus his stolen strike total. deGrom's rStrikes (robotic strike calls) were 1672 (1698 - 80 + 54 = 1672) and his rBalls were 1025 (999 - 54 + 80 = 1025).

From there, I can use deGrom's rStrike/rBall ratio and the relationship between strikes per ball and strikeouts per walk to approximate deGrom's strikeouts per walk assuming a robotic umpire. deGrom's rStrike/rBall ratio was 1.63, so the equivalent strikeouts per walk ratio would be 7 × 1.63 - 6 = 5.41 (not far off from deGrom's 2018 figure of 5.85). deGrom recorded 268 + 43 = 311 strikeouts and walks in 2018, so his adjusted totals according to his rK/BB and the sum of his strikeouts and walks would be 263 strikeouts and 48 walks. Plugging these back into the FIP formula yields an rFIP of 2.07.

The benefits of rFIP is that it serves to isolate the influence of pitch framing, bad umpires, or other factors and distill a pitchers' ability to throw strikes while incorporating a pitchers' other tendencies (HBP and HR). It also serves to identify which pitchers do not throw many strikes but are saved from these tendencies thanks to their catchers (Zack Greinke is a prime example of this, running an extremely high FIP - rFIP differential during his time in Arizona with Jeff Mathis as his personal catcher).

Background, nFIP

But Major League Baseball does not have robotic umpires at the moment, and given the current strength of the umpires' union, it might not for a very long time. In that sense, we might wish to know what a players' performance might look like if they were pitching to a normal catcher/umpire duo. To accomplish this, we will use nFIP - normal FIP - which captures a pitchers' performance based on their strikes and balls as if they had an average catcher.

Since 2015, pitchers have seen 7.8% of their out-of-zone pitches converted from balls to strikes, and 5.4% of their zone pitches converted from strikes to balls. Following our deGrom example, deGrom's expected strikes with an average catcher/umpire duo would be 1672 rStrikes - 1672 rStrikes × 5.4% + 1025 rBalls × 7.8% = 1662 nStrikes, and deGrom's expected balls would be 1025 rBalls - 1025 rBalls × 7.8% + 1672 rStrikes × 5.4% = 1035 nBalls.

Using the same methodology as rFIP in approximation K/BB from strikes and balls, deGrom's FIP becomes 2.10 - about a 0.16 difference between his FIP, which indicates that deGrom had a little bit of help from his catchers in getting his value).

As a quick "stupid-check", we can also see that our league FIP (4.12) is extremely close to our league nFIP (4.17), so we know that our conversion methods are rather effective.

Advantages of rFIP and nFIP

rFIP is quite useful for us in that it gives us a solid idea of how effective a pitcher is at throwing strikes. In terms reliability from year N to year N+1 among pitchers from 2015-2018 with at least 100 IP in year N and N+1, FIP and nFIP have roughly the same correlation (R2 of 0.2607 and 0.2643 respectively), rFIP has an improved year-to-year reliability with an R2 of .2860.

Neither rFIP nor nFIP appears to predict year N+1 FIP with any reliability (R2 of 0.1796 and 0.1846 respectively), which is to be expected: since we have neutralized the impact of catching on a pitchers ability, the influence of the catcher is seen in only FIP.

Caveats

As an important reminder, rFIP and nFIP are simplified models that are not attempting to be one-hundred percent accurate. We are approximating strike and ball calls, we are approximating a pitchers' strikeout/walk rates from those approximations, and we are assuming that a pitcher has no influence on pitch framing, which is most likely not a completely certain assumption. Still, our model passes the eye test in that it capably identifies terrific seasons and does not appear to unduly punish pitchers for their catchers' skill.

Leaders

A full leaderboard of rFIP and nFIP leaders since 2015, both split by season and career totals over that span, can be seen below. The default selection for the filter includes qualified pitching leaders.


Code

The SQL code used to calculate rFIP and nFIP from a Statcast database can be found here.

As always, if you have any questions, suggestions, or feedback, please leave a comment or tweet me @John_Edwards_.

Sunday, July 8, 2018

Effective Chase Score

Swinging at pitches outside the zone is generally bad. After all, players are essentially giving up free balls in exchange for either a strike or a poorly hit ball. But hey, if you can put the ball in play, it's not the worst thing in the world. With this in mind, yesterday, I looked at which hitters were best at avoiding chasing those pitches, while making contact on said pitches.
I decided to refine this methodology further, and talk about players ability to effectively chase in that they A. don't chase frequently, B. make contact on pitches that they chase, and C. make quality contact on pitches that they chase.

The three components I incorporated were 1-O-swing% (how frequently players did not swing at outside pitches), O-Contact% (how frequently players make contact on their swings outside the zone) and xwOBA on O-zone pitches (the quality of contact on pitches made outside of the zone). After pulling all of these figures for players with 1000+ pitches this season from Baseball Savant, I then calculated the z-scores for players with regards to each metrics, then added them all together. The end result I called the "Effective Chase Score".

Here are 2018's leaders in Effective Chase Score.

PlayerEffective Chase Score
Joey Votto6.85
Mookie Betts5.43
Brett Gardner4.45
Alex Bregman4.42
Nick Markakis4.23
Jesse Winker4.20
Ben Zobrist3.61
Andrew Benintendi3.56
Aaron Hicks3.43
Jose Ramirez3.40
Andrelton Simmons3.36
Travis Shaw3.22
Carlos Santana3.15
Mike Trout3.01
Shin-Soo Choo2.96
Lorenzo Cain2.87
Matt Chapman2.82
Buster Posey2.81
Ian Kinsler2.80
Denard Span2.70

As we would expect, Votto is miles away the best player in terms in effective chase rate - in addition to having extremely low chase rates, Votto makes contact frequently on his outside swings and has extremely effective contact on outside pitches.

Here are the worst batters by the same metric.

playerEffective Chase Score
Freddy Galvis-2.54
Michael A. Taylor-2.65
Kevin Pillar-2.76
Joey Gallo-2.83
Chris Davis-2.84
Carlos Gomez-2.84
Odubel Herrera-2.85
Eduardo Escobar-3.06
Robinson Chirinos-3.11
Teoscar Hernandez-3.61
Adam Jones-3.62
Tim Anderson-3.62
Giancarlo Stanton-3.63
Luis Valbuena-3.67
JaCoby Jones-4.07
Nicholas Castellanos-4.13
Jonathan Schoop-4.16
Lewis Brinson-4.76
Ryon Healy-4.82
Javier Baez-5.65

There are a lot of free swingers here, including Gomez, Davis, Gallo, etc. Baez, however, is almost as bad as Votto is good - Baez has the worst O-Swing% by 6% (Baez - 46.0%, second is Kevin Pillar, 40.5%), bottom tier O-Contact%, and Baez has just a .237 xwOBA on outside pitches.

To view the full list of hitters with at least 1000+ pitches faced, I published my spreadsheet below.

Friday, July 6, 2018

MiLB Statcast Project Part Five: Next Steps

What's next for MiLB batted ball data? Clearly, there are issues with it, thanks to biased stringers, but there's also a wealth of valuable information in here.

Having already calculated launch angle, it seems logical that the next step would be to calculate exit velocity. It would seem as though some relationship between hit distance (calculated using the home plate location found in part three and the coordinates of the batted balls) and launch angle would yield an approximation for exit velocity, and indeed, such a relationship appears to exist at the major league level.


Despite this, using the model that I reverse engineered from Statcast and correcting for differences in hit-tracking between the stringers and the MiLB, I found that such a model was grossly inaccurate at the minor league level. Shown below are MiLB hitters with at least 200 BIP in 2016 and 200+ BIP in the majors in 2017.

Perhaps the depth of batted ball locations are inaccurate, or perhaps the model itself has issues. I think this is a difficult challenge because we're trying to measure the size of an intangible object using its shadow - it's not as simple as plugging the values into excel's equation solver, as we need to have method behind our model. I think of this challenge as a WIP, and I hope to update this post with a solution soon, but for now I have no clear way of estimating MiLB exit velocity.

Still, the rest of the data that we're working with appears solid and powerful. I've already revealed a couple functions that I've been using, and I hope to develop an R-package for all of these functions, including heatmaps, splits, date-ranges, a built-in R scraper, and more. I hope to keep y'all posted on this later this summer.

Thank you for reading this series! I hope this was insightful or at least entertaining. In my opinion, not enough public analysts are using MiLB data, and while it's certainly rough around the edges, there's still valuable information to be gleaned from it.



Wednesday, July 4, 2018

MiLB Statcast Project Part Four: Reverse Engineering Launch Angle

In the previous two part, we focused heavily on creating visualizations of MiLB pitch and batted ball data, but our data was not really used to create any workable number or analogs for MiLB data. I consider it arbitrary to do things like calculate batting average or slugging percentage, but methods like the ones available with regards to Statcast, such as launch angle and exit velocity. We have neither of the values available to us in any form with minor league data, but we can approximate them. This section will focus on launch angle for MiLB hitters.

While we do not have launch angle available in any form for MiLB hitters, we do have limited batted ball classification data - stringers will manually tabulate which balls are ground balls, which are fly balls, which are line drives, and which are pop-flies. While the stringers do not operate with anything close to the precision of BIS's ball classification system, it still gives a rough idea of players' batted ball tendencies. 

For example, let's say we want to know how frequently Ozzie Albies hit fly balls in 2017 in AAA. There are two ways of calculating fly ball rates - FanGraphs includes pop-ups in their calculation of FB%, but Baseball Savant does not. We'll calculate both for posterity.


FanGraphs has limited MiLB batted ball data from STATS, and Albies' figure for 2017 in AAA is fairly consistent with what we calculated from FanGraphs (37.9% from FanGraphs compared to 38.4% from our dataset). Albies hit fly balls in the MLB in 2017 at a 40.3% rate according to FanGraphs, and at a 32.1% rate according to Baseball Savant, so our minor league figures appear fairly accurate given that Albies' fly ball rate was consistent with his measured values both in MiLB play and in the majors.

So how can we extrapolate launch angle from this? Launch angle plays a large part in batted ball classification. We can use batted ball tendencies to reverse engineer launch angle at the MLB level, and apply that to MiLB data. Using my personal Statcast DB, I found the average launch angle for each batted ball classification for all batted balls ever recorded by Statcast.

BB TypeLaunch Angle
Fly Ball36.646
Popup63.285
Line Drive16.756
Ground Ball-12.553

If we treat each batted ball as having been hit with its average launch angle, we can theoretically get a solid estimation of average launch angle from batted ball classifications alone. I pulled 2017 hitters with at least 200 PA and compared their estimated launch angle from batted ball classification to their actual launch angle, the results were extremely promising. To clarify, the exact equation used was:


Our R-squared value is .93, indicating that our formula does an excellent job of estimating launch angle solely from batted ball data - not surprising considering that Baseball Savant likely uses launch angle as a majority factor in classifying batted balls.

If we re-scale our values to get a 1:1 relationship, we have a fairly strong model for estimating launch angle from batted ball classification.


As strong as the correlation is between xLA and LA, our RMSE is a bit weak. In looking at the relationship between residual values and batted ball frequencies, it looks like we're introducing a bit of error with our POP% value.


I found that my RMSE was minimized at pop-fly coefficient of about 60.65 - my guess is that since Statcast has difficulties tracking some balls at extreme launch angles, the true pop-fly angle is skewed upward.


We've marginally improved our RMSE and r-squared with our model. I think there are probably some bigger steps we could take to improve the model's accuracy, but at the moment, I think our r-squared value is superb, and our RMSE value is acceptable as a model of launch angle.

Armed with our model, we are now prepared to determine MiLB launch angle from the batted ball data found in our dataset.

This somewhat-intimidating wall of code grabs batted ball values and calculates estimated launch angle from them using batted ball data. In our csv, we now have estimated launch angle values for hitters in 2016 and 2017 for minor league players! Of course, we need to check ourselves - how accurate are these launch angle values?

To determine the accuracy of our results, we'll compare year n to year n+1 correlation. I pulled hitters who registered 200+ balls in play in 2016 and 2017 (210 of them), and found a correlation between 2016's launch angle and 2017's launch angle of .6606, so this is our benchmark.

We're not going to compare players with 200+ BIP in the minors from 2016 to players with 200+ BIP in the minors from 2017 - it just tells us the correlation between our measured values of FB%, GB%, LD%, and POP% in a rougher form. Instead, we're going to compare hitters with 200+ BIP in the minors from 2016 to hitters with 200+ BIP in the majors from 2017 - in this sense, we're looking at how well MiLB launch angle predicts MLB launch angle.

After pulling these values, I only found 27 hitters who registered both 200+ BIP in AAA in 2016 and 200+ BIP in the MLB in 2017, which was a bit of a disappointment. Still - our r-squared value for these hitters estimated launch angle from their 2016 MiLB campaign and their 2017 MLB campaign was .7274. Because we're dealing with fewer hitters (27 versus 210) and because we're dealing with consistent young hitters (there's no decline due to age or dramatic changes in LA, unlike in our MLB dataset) our r-squared looks better than our benchmark of .6606, but I don't think for a second that our xLA is somehow a better predictor of launch angle than previous year's launch angle. (EDIT: it also might have something to do with the fact that I accidentally included multiple Jose Martinezes here)

Still, xLA appears to have undeniable predictive value.

We have a reasonable predictor of launch angle using minor league data! If we want to compare MiLB launch angles to MLB launch angles to draw comparisons between hitters, we now have that ability, and we can be reasonably confident in our ability to do so.

Monday, July 2, 2018

MiLB Statcast Project Part Three: Cleaning up and visualizing batted ball data

In our previous section, we looked at the issues involved with minor league pitch placement data, strategies for cleaning and visualizing the data, and then compared that data to MLB data. In this section, we'll grab MiLB hit data, and use similar strategies for cleaning and visualizing that data.

Looking at our data, we can see that we have similar issues to our batted ball data-set as we did to our pitching data-set.



The issues with the batted ball data-set are as follows:

  1. There exists bias in the way that batted balls are grouped - batted balls are clustered around where fielders play, especially in the outfield.
  2. The units of the x and y coordinates are not immediately apparent.
  3. The field's dimensions are not cleanly defined.
We have little realistic approach for fixing our issues with the bias in clustering, but we can address problems 2 and 3.

Let's start by discussing the units. When stringers are tracking a game, in order to place a batted ball on the map, they use a 250x250 pixel map of the field. Where they click is then recorded then in pixels as the location of the hit. We have to determine a realistic scale from pixel to a real-world unit in order to calculate factors like hit distance.

So then, let's try to establish some concrete markers for scale. If we look at the 10th lowest y-value for ground balls for a stadium, we get a rough idea of where home-plate is.



If we move that intercept down slightly and plot the median x-value of all batted balls, we should find the tip of the baseball "diamond".



From here, we can construct foul-lines knowing that a baseball field is constructed with a 90 degree angle between the lines. As long as the field is not rotated beyond what we've already done, we can simply construct perpendicular lines from the tip of our diamond outward.



To determine the dimensions of our park (to both plot the outfield lines and to figure out the scale of pixels to park), let's look at the placement of home runs in the park.



There are a surprising number of misplaced HR balls - a bunch of long balls never left the infield, according to our data. We'll filter them out. Then, we'll plot a line of best fit along the outfield wall.



This looks like a decent approximation of Coca-Cola Field's outfield wall. If we shift all the values downward, mess with the colors a bit, and we have a decent approximation of what Coca Cola Field looks like.


The wall is slightly below almost all home runs. Coca-Cola Field does not have a perfectly round wall, but this approximation gives a good visualization, and looks useful for spray charts. And we can finally approximate the pixel to feet conversion factor! Home plate is at ~50 pixels, and the centerfield wall is at ~210, which gives a pixel distance of 160 pixels. In real life, Coca-Cola field measures ~400' from home to centerfield, so our coversion factor is 400/160 = 2.5. It's ~140 pixels down the left and right field lines for values of 350' down the lines. Coca-Cola Field is actually 325' down both lines, but the field itself curves inwards quite a bit. We're not quite capable of doing this with our approximation, but the values line up quite well.

With all of this implemented, let's turn this into a function!



I've overlaid Rhys Hoskins' 2017 batted balls over Coca-Cola Park (no relation to Coca-Cola Field). Our estimations of power look fairly accurate - Rhys has ~25 HR by my count on this chart, when he recorded 29 total in 2017 in AAA. Not bad for completely estimating the outfield wall as a semi-circle.

But what's more important is the information that the chart presents - from this spray chart alone, it's apparent that Hoskins hits a lot of ground balls to the right side of the infield, making him an excellent shift target. He also has substantial pull power.

We can glean this information from a scouting report, but it's important to have a visual confirmation of what's reported, and we can also pick up on systematic changes in approach. We can go deeper in terms of visualizing prospects and MiLB players.

Friday, June 29, 2018

MiLB Statcast Project Part Two: Visualizing pitch data

For the second part of our series, we'll import our pitch-by-pitch data into R, and visualize it with ggplot2. Here, we'll clean it up and work on visualizing it effectively. So, let's fire up R.

On a microscopic level, our data looks nice and clean. For example, here are the pitch locations and outcomes for Dominic Smith's pitches in a random game.


Surface level, this looks okay - the approximate locations of balls and strikes are immediately apparent. But if we zoom out to the level of a full season, we start to see a lot of problems.



To give you an idea of why this is really problematic, let me show you Smith's pitch data from Baseball Savant from his MLB career to date.



This chart shows pitch types against Smith, not results, but the point is clear - Statcast, a much more precise measuring system, shows that pitch placement for almost all hitters are akin to multivariate normal distributions. There is no large gap where the edges of the strike zone are, as is present in Smith's MiLB data.

This data is present in almost every player's pitch chart. Here's J.P. Crawford in 2017.


And here's Ozzie Albies' pitch chart from the same year.


For all hitters, there appears to exist a substantial amount of pro-umpire bias in pitch placement from the stringers. MiLB pitch data records extremely few marginal pitches, and as a result, pitch data is heavily skewed. 

The bias appears to come directly from the umpires' calls - there are very few balls or called strikes on the edge of the zone. It's impossible that pitchers were so direct in placing their pitches in and out of the zone, or that umpires were so accurate in their ball and strike counting - rather, stringers see where the pitch was placed, see the umpires call, and either consciously or unconsciously adjust the pitch placement to reflect the call. Hence, all strikes in the strike zone, all balls outside the strike zone.

If we were to calculate plate-discipline statistics, like O-Swing%, Z-Swing%, O-Contact%, or Z-Contact%, we'd get worthless data. We can still calculate stats like SwStrk% or F-Strike%, and the pitch placement data appears to have some utility, but this is a stark reminder that this data set has a lot of problems.

We have another problem as well - we have no idea of the realistic size, shape, or scale of the strike zone! The X and Y coordinates have no immediately available units, and we have only the Y limits of the strike zone given - in completely different units than the ones for locating pitches.

What units are the X and Y coordinates? They appear to be simply in coordinates on a 250x250 grid, similar to that of batted ball placement. From eyeballing Smith's zone, it appears as though the top and bottom of smith's strike zone occur at Y coordinates of 180 and 120 units, so Smith's zone is about 60 units tall. According to Smith's sz_top and sz_bot, his strike zone is approximately 22 inches tall. 22 divided by 60 is roughly equivalent to the scale factor for inches to centimeters - so we can treat the pixels as centimeters.

To determine the edges of the strike zone, we can look at called strikes - we've already seen that almost all called strikes are recorded as being within the zone, so we can base our algorithmic strike zone on our called strike-zone. For the bottom of the strike zone, let's find the 3rd lowest y coordinate, and for the top, the 3rd highest (think of this as finding the median of the 5 highest or lowest coordinates). Repeating the process for the x coordinates yields the following plot.




Let's turn this into a working function, and remove automatic intentional balls (which are recorded as pitches at 1,1).


It's blindingly apparent why zone/out-of-zone metrics won't function with such a rigidly defined zone. However, this is not to say that the MiLB data is completely worthless - there's still some information we might stand to glean from this data.

As a practical example, let's take a look at whiffs. Joey Gallo has one of the highest swinging strike rates in the MLB, and he had similar problems while in the MiLB - Gallo had a swinging strike rate of 18.0% in AAA in 2016. Where were his whiffs coming from? I created a heat map of Gallo's whiffs in 2016.

I then grabbed Gallo's whiff heat map from Baseball Savant for his 2015-2016 seasons in the MLB, and surprise surprise, his whiffs occurred in largely the same location in the zone.


Despite the bias in terms of the zone, we can use MiLB pitch data to look at other significant trends, and explore pitchers and hitters in the MiLB to see how they might perform in the MLB. We're dealing with some messed up data, but it's not damaged beyond repair.