Most every batter wants to only swing at pitches he can hit. Clearly some are much, much, much better at it than others.
Thanks to a comment posted recently by WarningTrackPower I was able to add logic to determine if the pitch was in the strike zone. In prior years (pre 2018?) there was a field in the Game Day data call zone which could be used to determine if a pitch was in the zone according to the pitch track data. That field was removed and there was no documentation for how to use the other data fields to determine if the pitch was in the strike zone. Here is the video that finally unlocked the key (published in August of this year).
UPDATE: This work on this article started a while back and it was mostly completed before the Jeimer Candelario signing. So the tables were recreated to include him. Note that Nationals players from 2022 no longer on the team are included in the table for comparison purposes.
The image at right is an illustration of how the video describes the strike zone as follows:
- The green area is the area of the strike zone corresponding to the width of home plate and the area between the knee and mipoint beltween the waist and shoulders of the batter.
- The blue area width is half the diameter of the ball. For a pitch where the center of the ball is in the blue area, we know that part of the ball is in the green area (the dimensions captured with the data are the center of the ball).
- The orange area is what they consider to be the Margin Of Error. They describe pitches in this area as ones that can be expected to be called a strike. In the table below, I describe this area as Edge.
- Any pitch outside the orange area is a ball.
So we have 9 areas: Strike (green or blue), Edge (Orange) or Ball (outside the orange) for the horizontal and vertical dimension.
For anyone who is interested here is a screenshot of one PA and shows the PA/AtBat data and the Pitch data which is a child of the AtBat data row.
With the above background I decided to look at all the Nationals batters and determine what percent of the time they swung at pitches in each area.
Steve: I think it would be better to look at the swing and miss rates. Some guys like pitches out of the zone.
Don: Great point. I can easily rework the logic. The table shows how I recoded the PitchResult value (DES in the source XML files for anyone who is looking closely). The table includes all the distinct values found in the data. The Swung and Missed fields are my recode based on how it appears other sites do it.
For the purposes of the Swing and Miss calculation, I only include the pitch rows where the value of Swung is Yes.
The table below shows Swing and Miss calculations for the following combinations.
- Ball: Ball in both dimensions
- Strike: Strike in both dimensions:
- Ball:Edge: Ball in one; Edge in other
- Ball:Strike: Ball in one; Strike in other;
- Edge:Edge: Edge in both
- Stike:Edge; Strike in one; Edge in other
Note that additional detail is available. For example, we can look at which dimension is a ball or a strike or edge for both the horizonal as well as vertical dimension. Summarizing to that level would be far too detailed to start with IMO. And for Edge and Ball, we can also distinguish between inside vs. outside.
The color coding is based on looking at the 10th, 25th, 75th and 90th percentiles for the rest of MLB (i.e., all players excluding Nationals batters). And to clarify a nuanced point, the data for Bell and Soto count in the above table for when they were on the Nationals; when they batted for the Padres, the values are used in the percentile calculations.
- Green: top 10% of rest of MLB (i.e., low miss rate)
- Blue: top 25% of rest of MLB
- Black: between the 25th and 75th percentile
- Orange: worst 25%
- Red: worst 10%
I like that Ruiz has a lot of Green and Blue. I am a bit surprised that Vargas has as much blue as he does. Perhaps that is why he continued to play. The results also perhaps make it clear why Voit was non-tendered.
Steve: Not at all surprised about Ruiz. On Vargas, nobody should tell Stever 🤣.
Don: I think he will notice. Quite a few of the columns don’t have many pitches that are swung at. That also provides some insight. If there aren’t many Edge:Edge swung at, that is a very good thing.
The following table drills down on horizonal and vertical dimension for the columns that are not the same in each dimensions (i.e. all but Ball, Strike, Edge:Edge). Note that the same color coding is used. Lots to digest here.
Steve: Indeed there is. I hope someone is telling Victor about how often the pitch is at the height of the strike zone but is not a strike.
Don: Drilling down on the 88 pitches that are vertical strikes and horizonal balls, 55 of them are inside pitches; for the 19 that are vertical strikes and are on the edge horizontally, 13 are inside. He clearly needs to work on not swinging at inside pitches.
Seems to me we should should go with this now and get feedback on what further detail the TalkNats community would like to see. That can then perhaps be a followup post at some future point in time.
Steve: Sounds like a plan.