Wednesday, July 4, 2018

MiLB Statcast Project Part Four: Reverse Engineering Launch Angle

In the previous two part, we focused heavily on creating visualizations of MiLB pitch and batted ball data, but our data was not really used to create any workable number or analogs for MiLB data. I consider it arbitrary to do things like calculate batting average or slugging percentage, but methods like the ones available with regards to Statcast, such as launch angle and exit velocity. We have neither of the values available to us in any form with minor league data, but we can approximate them. This section will focus on launch angle for MiLB hitters.

While we do not have launch angle available in any form for MiLB hitters, we do have limited batted ball classification data - stringers will manually tabulate which balls are ground balls, which are fly balls, which are line drives, and which are pop-flies. While the stringers do not operate with anything close to the precision of BIS's ball classification system, it still gives a rough idea of players' batted ball tendencies. 

For example, let's say we want to know how frequently Ozzie Albies hit fly balls in 2017 in AAA. There are two ways of calculating fly ball rates - FanGraphs includes pop-ups in their calculation of FB%, but Baseball Savant does not. We'll calculate both for posterity.

FanGraphs has limited MiLB batted ball data from STATS, and Albies' figure for 2017 in AAA is fairly consistent with what we calculated from FanGraphs (37.9% from FanGraphs compared to 38.4% from our dataset). Albies hit fly balls in the MLB in 2017 at a 40.3% rate according to FanGraphs, and at a 32.1% rate according to Baseball Savant, so our minor league figures appear fairly accurate given that Albies' fly ball rate was consistent with his measured values both in MiLB play and in the majors.

So how can we extrapolate launch angle from this? Launch angle plays a large part in batted ball classification. We can use batted ball tendencies to reverse engineer launch angle at the MLB level, and apply that to MiLB data. Using my personal Statcast DB, I found the average launch angle for each batted ball classification for all batted balls ever recorded by Statcast.

BB TypeLaunch Angle
Fly Ball36.646
Line Drive16.756
Ground Ball-12.553

If we treat each batted ball as having been hit with its average launch angle, we can theoretically get a solid estimation of average launch angle from batted ball classifications alone. I pulled 2017 hitters with at least 200 PA and compared their estimated launch angle from batted ball classification to their actual launch angle, the results were extremely promising. To clarify, the exact equation used was:

Our R-squared value is .93, indicating that our formula does an excellent job of estimating launch angle solely from batted ball data - not surprising considering that Baseball Savant likely uses launch angle as a majority factor in classifying batted balls.

If we re-scale our values to get a 1:1 relationship, we have a fairly strong model for estimating launch angle from batted ball classification.

As strong as the correlation is between xLA and LA, our RMSE is a bit weak. In looking at the relationship between residual values and batted ball frequencies, it looks like we're introducing a bit of error with our POP% value.

I found that my RMSE was minimized at pop-fly coefficient of about 60.65 - my guess is that since Statcast has difficulties tracking some balls at extreme launch angles, the true pop-fly angle is skewed upward.

We've marginally improved our RMSE and r-squared with our model. I think there are probably some bigger steps we could take to improve the model's accuracy, but at the moment, I think our r-squared value is superb, and our RMSE value is acceptable as a model of launch angle.

Armed with our model, we are now prepared to determine MiLB launch angle from the batted ball data found in our dataset.

This somewhat-intimidating wall of code grabs batted ball values and calculates estimated launch angle from them using batted ball data. In our csv, we now have estimated launch angle values for hitters in 2016 and 2017 for minor league players! Of course, we need to check ourselves - how accurate are these launch angle values?

To determine the accuracy of our results, we'll compare year n to year n+1 correlation. I pulled hitters who registered 200+ balls in play in 2016 and 2017 (210 of them), and found a correlation between 2016's launch angle and 2017's launch angle of .6606, so this is our benchmark.

We're not going to compare players with 200+ BIP in the minors from 2016 to players with 200+ BIP in the minors from 2017 - it just tells us the correlation between our measured values of FB%, GB%, LD%, and POP% in a rougher form. Instead, we're going to compare hitters with 200+ BIP in the minors from 2016 to hitters with 200+ BIP in the majors from 2017 - in this sense, we're looking at how well MiLB launch angle predicts MLB launch angle.

After pulling these values, I only found 27 hitters who registered both 200+ BIP in AAA in 2016 and 200+ BIP in the MLB in 2017, which was a bit of a disappointment. Still - our r-squared value for these hitters estimated launch angle from their 2016 MiLB campaign and their 2017 MLB campaign was .7274. Because we're dealing with fewer hitters (27 versus 210) and because we're dealing with consistent young hitters (there's no decline due to age or dramatic changes in LA, unlike in our MLB dataset) our r-squared looks better than our benchmark of .6606, but I don't think for a second that our xLA is somehow a better predictor of launch angle than previous year's launch angle. (EDIT: it also might have something to do with the fact that I accidentally included multiple Jose Martinezes here)

Still, xLA appears to have undeniable predictive value.

We have a reasonable predictor of launch angle using minor league data! If we want to compare MiLB launch angles to MLB launch angles to draw comparisons between hitters, we now have that ability, and we can be reasonably confident in our ability to do so.


  1. Great Blog... The information you shared is very effective for learners I have got some important suggestions from it, Keep Sharing such a nice blog.

    BIM documentation in USA

  2. Reverse engineering Grand Prarie , Alberta is outstandingly extraordinary way to deal with get the data inspecting. Sifting result gives the perfect outcome of your assignment.

  3. Do it yourselfers often take over three months to get their engines installed. I have devised a system outlined below where we can change a main engine in just one week.OMC repair parts