The journos have finally discovered expected goals and some of them are outraged. I was asked on twitter yesterday to explain expected goals. Some presenter named Jeff Stelling went on a rant about expected goals on Sky Sports making fun of a manager who used xG to say that the game was closer than people think and then calling expected goals “the most useless stat in football history.” This rant has in turn caused many reporters to line up on either side of the debate with Barry Glendenning tweeting that “imbeciles twatting on about xG will be pinpointed as the exact moment football ate itself” and Sid Lowe, Rafa Honigstein, and many others responding to him suggesting that while there may be flaws, it’s really not a bad stat.
It’s funny to me to see this debate play out so publicly all the sudden. Expected goals has been around for years – I’ve been using it for three years and I believe it was crafted at least five years ago. Which doesn’t mean anything other than I guess it has finally seeped into the football consciousness enough that everyone knows about it.
I want to be very clear from the start: I don’t mind if people don’t like stats. I’ve been writing a stats/data-informed football blog now for 10 years and while at first I was defensive about my use of data I am now sanguine. If you don’t like data, go read another site. There are plenty of people out there who write “from their gut”. My problem is that my gut is dumb and so I need data to inform my writing.
The garden variety critiques we get of stats is 1) they don’t pass the eye test 2) I’d rather believe my own experience and 3) they don’t tell the whole story. I’ve been over these critiques so many times before but let’s just do it once more for fun:
- Stats are collected with eyes – trained humans watch games and collect data points based on pre-agreed definitions.
- That’s great, and in your experience you have no doubt had a conversation with another fan who saw the game completely differently from you. Stats are just seeing the game differently from you. Neither view is inherently more right or wrong. Stats, however, have pre-agreed definitions and since we have now been collecting them over a long period of time we have a lot more data available to us (all of us) than someone’s memories.
- Nothing tells the whole story. Anyone who tells you that their stat tells the whole story is selling you a lie. Anyone who tells you that their article tells the whole story is selling you a lie. It’s like making a pie from scratch: if you want to tell the WHOLE story, you need to go all the way back to the big bang.
I also find that the vast majority of people who resist the modern stats have no problem quoting stats all the time – the stats that they like. You will often see someone who ridicules something like xG say “Ozil ran 14k today” or “Arsenal have never lost a home game after a Monday match when Fabregas is fit” or “he made 12 tackles!” I don’t see the anti-stats people being very consistent with their anti-stats. What they don’t like is a stat which challenges their perception.
And that’s the thing: stats are NOT perfect! What is a tackle? You probably think you know. You don’t. A tackle in the Opta world is when one player attempts to get past another player while in possession of the ball. If the player with the ball gets past his man, it’s a successful dribble and the defender gets a “was dribbled”. If the defender stops the attacker it’s a successful tackle – whether he wins the ball back for himself or not. And if a foul happens in that instance it’s not scored as a tackle, it’s a foul.
But what do tackles even mean? What does it mean if your team tackles a lot? No one has yet to come up with a meaningful way to use tackles to predict anything because some teams and managers like to tackle more than others. Sam Allardyce’s teams are typically thought of as “tough tacklers” but they only tackle in the final third and actually show a great deal of restraint in overall tackling, so despite their lack of possession, they don’t tackle much.
Bournemouth are the team with the fewest tackles per game this season, just 12.8. Burnley are 4th fewest with 14.1, Brighton 14.3, West Brom 14.8. These teams all play a style of football that doesn’t want players diving into tackles. They also don’t give up a lot of space for players to dribble. So, their tackle numbers are low. Does that mean they don’t play defense? Hardly.
But did you see what I just did there? I hope I didn’t make anyone feel bad about their experience of the game. That’s not my intent. I am not “stats bombing” you or trying to belittle you. But I did just use stats to explain both why stats are kind of useless and also showed how they can be useful. That’s my little stats Jedi Mind Trick – these are not the stats that you’re looking for, these stats are free to go.
Anyone who approaches data rationally knows that all data is flawed – and some more flawed than others. Expected goals is not perfect. There were several matches this weekend where the scoreline and the expected goals numbers weren’t even remotely close. That can happen! Because expected goals is generated by aggregating ALL of the shots that came before. All of the shots from outside the box have a 3% chance of scoring. So, when a shot is scored from outside the box the xG is 0.03 and the actual G is 1.. a discrepancy of 0.97. Or when a shot goes in from inside a crowded 18 yard box, there is an xG assigned of just 0.1 and an actual G of 1.
The Arsenal-Spurs game is a great example. Arsenal went 2-0 up on their first four shots. They had a shot blocked from outside the box (0.03), Lacazette missed inside the box (0.1), Mustafi scored a header from the 11 meter circle off a set play (0.1), and Alexis scored a goal from like 6 inches (0.45). Total expected goals for Arsenal in the first 68 minutes – 0.68. But then in the last 30 minutes Arsenal went buck wild! Alexis had two shots one-v-one with the keeper and Arsenal took 8 more shots in that time. That brought their total xG for the game up to 2.03, matching the actual goals scored.
There are lots of ways to look at this. First, you can ask “how do we improve xG?” A great question and one which is constantly being refined. My formula is going to be modified to remove blocked shots. Teams like Leicester and Burnley have shown that aggressive shot blocking is an effective defensive strategy. It seems contrary to logic and most stats suggest that overall shots allowed tends to correlate strongly with goals allowed but clearly there is a problem with xG and teams who block a lot of shots. Those teams tend to look like they should be giving up a lot more goals than they are. In my formula, Burnley have a defensive xG allowed of 19 but they have only conceded 9. I can’t wait to revise that by removing blocked shots and just accounting for shots missed or on target.
Removing blocked shots will increase the xG number on shots from distance (most of them are blocked) and on shots in the 18 yard box. This will probably make xG map better to games with fewer shots.
Second, you could look at the discrepancy as Mustafi scoring a really great header. When fans see a goal scored from outside the 18 yard box they know that they’ve seen something special because it doesn’t happen that often. The same logic applies for Mustafi’s header: it was a low percentage scoring chance and he scored it. We could look at the low xG and instead of saying “xG is broken” we could say “damn, that was a great header and the low xG just shows us how great it was.”
And third, you could just look at expected goals as yet another data point. It doesn’t explain everything but if you see a team scored 5 goals and had an expected goals of 0.4, you may want to go have a look at the shot chart – my guess is that they got some once in a lifetime goals in that match.
All of this just adds more enjoyment to the game. It give football fans something to talk about and give us points to argue. It doesn’t make me more right or you more wrong but I do think it’s rather silly to just dismiss it or to say that xG is the death of football.
It’s just data.