Rahm Emanuel was famously quoted as saying:
“You never let a serious crisis go to waste. And what I mean by that it’s an opportunity to do things you think you could not do before.”
The quote is all I’ll say on politics or politicians. Lets talk about what changes you would make in the MLB rules (and more, like the schedule) if you could. But first some background.
When Ghost started this site he enlisted a few folks (like me) to get involved. Given my background in technology and analytics I focused on those two areas. At the same time a colleague and I were considering writing a book on a particular feature of SAS Software (widely used for both data management and analytics). We needed data, and lots of it, to use as the sample data for the book. I convinced my co-author that the MLB Game Day data would be exactly what we needed. Not being a baseball fan, he was initially skeptical, but after I showed him how much data there was (and how complicated it was), he bought in.
So we embarked on a very long effort with lots of twists and turns to get permission to use that data. The Game Day site says the data is free for non-commercial purposes, but a book that a software vendor is going to sell is a commercial purpose. So we needed permission. Needless to say, that never happened. So I decided that I needed to generate the data and wanted data like baseball, but not baseball. So I turned to a few folks here (Ghost of Steve M., Section222 Game7!, sjm308, Andrew Lang) for some ideas on rules changes. They made great suggestions, not all of which I could implement because I had to write programs to generate the data and some of the suggestions were great ideas, but hard to implement in the limited time available. The best part of those conversations was Deuces saying something along the lines of
I’d love to help you come up with rules for this bizarro ball game.
And of course, the name of the game became Bizarro Ball (notice it is also BB).
Here are the rules of the game we came up with (the following text is an extract from section 5.1 of an early draft of the book) that describes the rules and a bit of details about Bizarro Ball. The theme of the book is about a team of consultants (my co-author and I) writing a Proof of Concept for the Bizarro Ball headquarters (thus the reference to users).
An Overview of Bizarro Ball
The data to be generated is for a fictitious game called Bizarro Ball. Bizarro Ball is conceptually similar to Baseball, with a few wrinkles. Our generated sample data will include data for teams, players, and games. The game data will include a row for each pitch, at bat and runner. This provides us with a rich set of data and also provides the opportunity to highlight both well known, and less known/used, capabilities of the SAS hash object.
The key features of Bizarro Ball that we agreed to implement in our programs to generate the sample data include:
- Creating data for 32 teams, split between 2 leagues with 16 teams in each league.
- There is no interleague play during the regular season. Each team plays the other 15 teams in their league.
- Each team plays each other team a total of 12 times; 6 as the home team and 6 as the away team. In other words a balanced schedule.
- Games are played in series consisting of 3 games.
- Each week has 2 series for each team. Games are played on Tuesday, Wednesday, Thurday; the second series of games are played on Friday, Saturday and Sunday. Monday is an agreed upon off-day for each team. This off-day is used when it is necessary to schedule a date for a game that was cancelled (e.g., due to the weather). It was agreed that to simplify the programs to generate our sample data that we would assume that no such makeup games are needed.
- Since each team plays each other team in their league 12 times, this results in a regular season of 180 games. Since each team plays 6 games a week, the Bizarro Ball regular season is 30 weeks long.
- Another simplifying assumption that was agreed to was that we could generate a schedule without regard to constraints related to travel time or rules about consecutive home or away series. As long as each team has two 3-game series with each other team as the home team and two 3-game series as the away team, we could ignore other constraints. And as a further simplification, it was agreed that we could generate a balanced schedule for the first 15 weeks with each team having one series as the home team and one as the away team during that 15 week period. The schedule for weeks 16 thru 30 is generated by copying the schedule from weeks 1 thru 15 and reversing which team is the away team and which team is the home team.
- Each game is 9 innings long and games can end in a tie.
- If the home team is ahead going into the bottom half of the 9th inning, they still play that half-inning. The reason for that is that the tie breakers for determining who the league champion is include criteria that could adversely impact a good team if they are often ahead at the beginning of the bottom half of the 9th
- Each team has 25 players and has complete control over the distribution of the positions a player can play. It was agreed that we would make some simplifying assumptions and assign players to each team in a systematic way. For example, each team would have 5 starting pitchers and 8 relief pitchers. The Business Users agreed to this as long as their was some flexibility in the number of players for each position.
- Each team would set its lineup for each game using whatever criteria they felt appropriate. We informed the Business Users that implementing the logic to implement a rules based approach to do this did not add value to the PoC and would take significant extra time. So it was agreed we could randomize the generation of the line-up for each game.
- One of the key differences between Bizarro Ball and baseball is reflected in how pitching changes are handled. First, there is no designated hitter. Second, pitching changes can only be made at the beginning of an inning. Except in the case of an injury, the pitcher that starts an inning, must complete the inning (i.e., get 3 outs). And third, the pitcher who started the game must pitch 6 innings (again barring injury); at which point he can be replaced. In order to simplify the programs that generate the data, the Business Users agreed that we could ignore the possibility of injuries to pitchers and that we would implement logic that a pinch hitter would be used for a pitcher starting in the 7th inning if his team was behind; and the pitcher would be replaced by a series of relief pitchers who would each pitch one inning.
- Another difference between Bizarro Ball and baseball is what happens when a batter draws a walk. In Bizarro Ball, every runner on base advances one base when the batter draws a walk. So for example, if there are runners on first base and third base when a batter draws a walk, the runner who was on first, goes to second; and the runner on third also advances – to home plate and scores a run.
In order to allow for flexibility in generating the data that describes the results of each pitch and each at bat, the Business Users agreed to provide data tables that describe the distribution of the results of each pitch (i.e., was it a ball or a strike or an out or a hit, and so on). Likewise data tables that describe the distribution of the results for each hit). We agreed that these tables could be edited to produce different simulated results. A further simplification that was agreed to was to not include in our generation data the logic to implement:
- Sacrifice Bunts
- Stolen Bases
- Sacrifice Flies
The Business Users reluctantly agreed to this suggestion as they are part of the game. We pointed out that these events were not needed for the PoC and complicated the generation of the simulated data, adding that these events could certainly be included once the transactional data from real games was used once our PoC was accepted and the application was built and put into production.
We also agreed to defer to later the issue of trading players, dropping players, replacing players, and so on.
Some Other Notes
Before leaving it to the TalkNats community to talk about whatever changes they would like to see, I’d like to close with a few personal notes. But first, don’t hold back on any ideas no matter how off-the-wall they are. You never know what they could lead to.
- My co-author had never been to a baseball game. In fact, he had never even watched one start to finish. As chance would have it, we were invited to present a seminar to the Boston SAS User’s group about the topic of the book. And guess what, the Red Sox were in town. So I got to take him to his first ever baseball game – and it was Fenway Park. That was quite an experience for both of us (it was my second time to Fenway however).
- Even though the TalkNats blogs played a role in the book, I never wrote about it here because it is a book about technology, not baseball. Some of the text about the business user questions might be of interest to baseball fans, I did not want anyone to think it was a baseball book, buy it, and then be disappointed.
- SAS (which is both the software vendor and the publisher of the book) has decided that in response to Covid-19 they are making many of the eBooks in their library available for free. Need to keep us software and data scientist type geeks busy with geeky stuff. So if you are interested you can download a free copy of Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study. When you visit the page it shows a price; a coupon is automatically applied at checkout to set the price to $0.00. Even if you are not a SAS users, you might find it an interesting book to skim. Please note that you may have to create an account (they are free) on the SAS site. I am not sure if there is a guest checkout as I have an account that was automatically used.
- And one last note, if you are interested in learning more about SAS Software they provide a free copy, the SAS University Edition, that is free to anyone (no need to be associated with University) in order to learn the software. And yes, you can download the sample data and programs and experiment with them using that free version of SAS.