We are now a quarter of the way through the season, and that means it's time to introduce a little side project that I've been cooking up: the Glorious Rugby Index (or GRI, because every rating system needs an acronym).
At present, there is precious little data available for Major League Rugby. To my knowledge, there is no publicly accessible data tracking aspects of the game like tackles, meters made, number of phases, or even penalty counts. This has stifled any attempts at forming a fantasy league around MLR, as well as making it difficult for people like me to create stats-based rankings for the teams.
There is, however, one key stat that determines whether a team wins or loses, and it is publicly accessible: the score. For now, that's what the Glorious Rugby Index is based on.
Calculating the score
Fundamentally, the system works by comparing how much you score against an opponent compared to how much everyone else has scored against them. As an illustration, imagine this scenario:
The Snow Lions have scored an average of 20 points per game while conceding 22 points per game. The Green Bears play a match against the Snow Lions and win, 25-19. In this game, the Green Bears scored 3 points more than the Snow Lion's average conceded points, giving them a GRI on attack of 3. Similarly, the Green Bears conceded 1 less point than the Snow Lion's average points scored, giving them a GRI on defense of 1. In total, they have a GRI for the match of 4.
Of course, the GRI of a single match probably doesn't accurately reflect the true talent of a team, so this number is averaged over a season. I'm also slowly phasing out last year's results so that the ratings have some consistency.
The system can then be used to predict the score of a future match by comparing the GRI's of the two teams playing. Let's say that the Golden Frogs have a GRI of 6.5 and the Heroic Herons have a GRI of -1. On a neutral field, we could expect the Golden Frogs to beat the Heroic Herons by 7.5 points by adding both of their GRIs.
We can also take their GRI's on attack and defense and figure out a precise score by comparing them to the average number of points scored in the league. Let's assume the following:
- The Golden Frogs have an attack of -2 and defense of 8.5
- The Heroic Herons have an attack of 3 and defense of -4,
- The league average for points scored is 25.
In this case, we take the Golden Frogs' attack of -2 and add it to the league average of 25, giving us 23. We then add subtract the Heroic Herons' defense of -4, giving us 27. We do the opposite for the Heroic Herons, adding their attack of 3 to the league average of 25, giving us 28, then subtracting the Golden Frogs' defense of 8.5, giving us 19.5. This means the predicted final score is a Golden Frogs' victory, 27-19.
The predicted scores, as discussed later, are not very reliable on a game-by-game basis. This limits the usefulness of score-line predictions, as they don't give a good idea of what the end score will actually be. However, it's also possible to take the GRI and calculate a percentage chance that a team will win outright, and doing so produces a more accurate and useful result.
Looking at last year
Using last year's data, I calculated the GRI of all nine teams. The results were pretty much what you would expect:
|San Diego Legion||9.8||2.3||7.5|
What's most interesting is how these numbers both confirm and contradict some of storylines of last season. Seattle, famed for its Sea Wall defense, was actually barely better than average at preventing points. Their strength actually lay in their league-leading attack. The best four teams in the league by these numbers made the playoffs, but the best team didn't win it. In fact, the numbers gave Seattle only a 12% chance of lifting the trophy.
Home field advantage (HFA)
It's generally understood that teams have an advantage when playing at their home stadium: the fans are on their side, they are familiar with the pitch, and they aren't tired from travelling. And, indeed, this is born out in the data. A team at home has an average GRI advantage of 1.1 points per game, which translates to a 57% chance of victory between otherwise equal teams.
This number is probably lower than most people expect, but it may be that in a young league home advantage isn't quite as established. In order to benchmark this, I took a look at the last few years of Premiership rugby in UK, which is a much more mature competition. in that league, home field advantage is roughly 4.5-5.5 points every season, which is pretty crazy. For reference, that sort of advantage would give the home team a 77% chance of winning against an otherwise equal team.
The real question is whether we can rely on the GRI to deliver accurate results, and whether those results are precise enough to be useful.
Score-line prediction accuracy
The GRI score-line predictions, which attempt to guess the actual final score of each game, are fairly accurate. On average, they are nearly dead on, certainly more so than I would have expected. However, the problem lies in precision. While on average the system is right, the actual variation in scores is enormous. The system only has a 50-50 chance of guessing the margin within 6 points, and on average it's off by over 10 points.
The graph on the left below illustrates this problem. While the trend line shows the predictions lining up with the results, it's obvious that the actual data points vary wildly. The graph on the right below shows this same data with any match involving an expansion team removed. This represents a best case scenario for the system, and while the data clearly fits better, it's still not a super close fit.
This lack of precision doesn't necessarily make the system useless for predicting scores. If you wanted to create betting lines, for example, having something that is accurate but imprecise would still be preferable to just making something up. Still, with results varying so dramatically, I don't think it's worth publishing the predicted score-line for each game.
Chance of victory accuracy
The GRI is much better at predicting a winner straight up. I developed a formula to predict the odds of victory given a particular margin, and it has done a pretty good job so far. I've calibrated it to have a 95% confidence at two standard deviations, a roughly 12 point margin in the GRI.
I've again created two graphs so that you can visualize the accuracy of the system. Each match is given a 1 for a win or a 0 for a loss (draws are ignored for now), which are then plotted according to the odds the system gave them before the match. If the GRI is assigning the correct chance of victory to each match, then the trend line should go perfectly from the bottom left to the top right at a 45 degree angle. If the system is too shy and underestimates the higher-rated team's chance of victory, then the line will be steeper. If the system is too bold and overestimates the higher-rated team's chance of victory, then the line will be shallower.
The chart on the left below shows that the system has been a little too bold in its predictions so far. I'm not especially concerned, since the system is still working off a bad first week and accuracy has improved every week since. Furthermore, the graph on the right has had all of the expansion team data removed, and it shows that the system has been extremely accurate for teams with pre-existing data. Once the system gets to know the expansion teams a little better, the accuracy of the system overall should improve.
To get a benchmark for accuracy, I ran the whole system on the Premiership as well, and I found similar levels of accuracy and precision. This points to something that I think we all intuitively understand: that rugby is an inherently chaotic game. You might be the better team, but that doesn't guarantee that you'll win. Any team, if they play well, can upset any other team, and every game is in play. For the system, that's a problem, but personally I think it's what makes this sport so great.