Supplement to the Arsecast: more xG than you can shake a stick at

Hey everyone, I had a guest spot on Arseblog’s arsecast today and Andrew asked me to help explain xG and whether or not Arsenal are over-performing this season. This post is meant as a supplement to that conversation: to amplify and maybe clarify some of the points I made on the Arsecast. So, erm, yeah, this is what’s happening below.

What is xG? In the early days it was a simple idea. Using shots data from all of the top leagues, thousands of shots, the data showed that certain shots scored at a certain frequency, for example, shots from outside the 18 yard box were scored at about a 3% rate, penalties at a 75% rate, and shots inside the 18 yard box a 10% rate.

But as we got more data, we were able to figure out that other factors mattered as well and were able to refine the formula: the angle of the shot, the number of defenders between the shooter and the goal, whether the chance was a header or a footer, was the shot from a corner, what kind of key pass (throughballs and drag-backs are especially dangerous), was it a fast break, was the attacker 1v1 with the keeper, which league are these players in, and so on.

All of this data has produced a number of very accurate models. My own model explained 90% of the variance. In other words it was 90% accurate, over the course of the season, taking into account all of the shots in all of the games.

The problem with expected goals isn’t the accuracy of the model, it’s the name. It’s a catchy name but it’s misleading. Expected goals is actually just a measure of average shot quality. It should be called something like “chance quality” or “percent chance of scoring” – the bigger the “chance quality” the more likely the player was to score. So, to explain a shot in both terms: if Christian Benteke has a header, wide open, right in front of the goal we “expect” him to score and the shot has a .40 xG or is a “40% chance of scoring”.  

It’s also an odd metric because if you add up all of the chances taken in a match, you can produce a single match xG, which is a bit misleading because it reads like “this is the number of goals you should have scored” but it’s more accurate to say “this is all of the percentages aggregated and it sort of looks like a number of goals but really isn’t”. And we shouldn’t get too hung up on whether the xG matches the aG (actual goals) because in a football match, just like in any game, there is going to be variance.

If I sit down at a blackjack table and play a basic blackjack strategy, I can reduce the house odds to as little as 0.5%. If I play 100 hands one-on-one with the dealer and I stick to my strategy I would have an expected losses of just $0.50. However, if I win four times in the first five hands I will be “up”, I will be overperforming. You might call this luck, but it’s just variance.

This happens with the League Table every season:

Source: https://www.reddit.com/r/soccer/comments/40zske/201516_premier_league_breakdown_the_year_so_far/

The same thing happens with expected goals. Which leads me to the idea that Arsenal are overperforming, which is annoying to people and I can understand why. Using xG (from understat.com) Arsenal have scored 9 more goals this season than the aggregate of their shots average would suggest they should score. These “extra” goals mean Arsenal are also “overperforming” their “expected points” by a little over 7. In short, they are sitting at the blackjack table with a much larger pile of winnings than they probably should and the pit boss is hovering.  That said, as long as the pit boss doesn’t see any cheating, he’ll just chalk it up to variance.

This variance is typically very high at the start of the football season. Because a single match like the one against Fulham, where Arsenal scores five goals on 9 shots, with an xG of just 1.4 is providing +3.6 goals. The negative way to look at this is that Arsenal were lucky. The positive way to look at it is that Ramsey, Lacazette, and Aubameyang just scored some cracking good goals that beat the odds. 

The bugbear in the room is the nasty old “reverting to the mean”. Which simply means that the team will start getting goals (and thus points) consistent with the shots they are taking. That might happen and it also might not. I’ve seen teams overperform over a season and underperform over a season. It doesn’t mean that xG is broken any more than winning $100 in a night means that you broke blackjack. 

Arsenal are beating the odds so far, but it’s a very small sample. Arsenal have only taken 97 shots. That’s 12th in the League and 77 fewer shots than Man City. But Arsenal have scored 19 goals, 2nd in the League.

Arsenal’s conversion rate is also very high right now, 20%. Average for the last three seasons in the Premier League is 10.5%. But… when Man City won the League last season they also converted at a high rate, 15%. Liverpool converted 13% and when Arsenal won the League in 03/04 we also converted 15% of our shots. I’d say the odds are good that we won’t score 20% of our shots for the rest of the season. But if they convert less, they also might start taking more shots, or creating more shots in more dangerous areas.

That’s what attacking football teams do, they try to win games. Arsenal are going to try to win games and if that means taking more shots, I have no doubt that they will take more shots. In which case they will probably convert fewer and will “revert to the mean”. Maybe! Maybe not.

But in the case of Arsenal this season, a lot of that crazy high conversion rate is coming from shots outside the box. Arsenal have scored 4 goals on 38 shots outside the 18 yard box, that’s a 10% conversion rate. And there’s another weird thing happening with Arsenal, almost all of their xG “overperformance” (variance) is happening in the 2nd half of matches:

So what, if anything, does all of this mean? 

  1. Arsenal have scored more goals than expected but it’s really early in the season and we are probably just looking at variance. Arsenal have played 8 games and scored 5 goals in a match where they created an expected goals of just 1.14. Arsenal also scored 3 goals against Cardiff with an xG of 1.34. That’s two matches with an xG of 2.48 but an aG of 8.
  2. 5.52 of their 6.56 “overperformance” of xG come from those two matches.
  3. Watching the match against Fulham did you really “expect” the Ramsey goal or the Lacazette shot from outside the box? Probably not. Neither did the xG model. THAT IS WHY THEY ARE SPECIAL.
  4. Instead of asking whether the model is broken why not point to those super low xG rates and say “this proves how hard that shot was”? Isn’t that slightly more fun? To live life that way?
  5. If Arsenal do something weird, and score +60 over xG, you can bet on it that the people who create these models will take a look at why. 
  6. Maybe the “wheels will fall off” this Arsenal train and Arsenal will start scoring a lot less. More likely we will see a smooth correction which will either translate into slightly fewer goals or into the same number of goals and more shots taken. 
  7. Teams do sometimes overperform for an entire season. Chelsea were +23 over xG when they won in 2016/17, Man City overperformed xG by 15 last season. Sometimes I win at blackjack. 
  8. I think we will have a clearer picture of how Arsenal play closer to the midway point of the season. 
  9. Maybe you don’t like stats. Cool. Some of us do. And we like to use stats to illustrate certain points. And maybe you hate that and that’s cool too. Don’t read the stories that you don’t like. Or do read them and maybe think about them a bit and try to see things from another point of view. I like both types of stories, as long as they are well written. 
  10. No one is saying that stats tell the whole story. Please, for the love of god, please stop saying that. 

Qq

Source: Understat.com

16 comments

  1. You guys also discussed Welbeck and Iwobi’s XG involvement. You said they are providing more than Ozil and Ramsey, which is facts for this season. I was wondering if you could compare their XG involvement for this season against those of Ramsey and Ozil from last season.

    I think this might help us understand if we are playing to the strengths of the squad, or if we are rating them this highly irrespective of whether they have improved the team’s XGs or their own. I say last season because Ozil and Ramsey played in their favourite positions.

    I know change is good, but it can’t be hyped up if it’s just change for change’s sake, rather than for the improvement of the team.

  2. Just wanted to mention I really enjoy your work on Arseblog. It was a treat to have you on the actual Arsecast today, one of my favorite episodes in a long time.

  3. Tim, the xG stat would become a lot more accurate if it could be personalized to each individual. For example you mentioned that Lacazzete scores at 10% rate from outside the box while the mean from all players is 3% but it is the 3% that is being used to evaluate his xG from shots outside the box.

    Is individual data on xG possible? That would give rise to IxG (Individual expected goal) as opposed to the current “generalized expected goal”.

    1. I think that is a good idea but hard to implement. Two things from top of my head:
      1. Insufficient data; i believe
      2. A player’s IxG would change as they evolve with time, this would make time or seasons a major factor while xG builds more on keeping historical data as a static reference.

      However, one could sort of ‘cluster’ players of the same type, i.e. similar playing style in similar positions and then analyze this aggregated data. I think they call it classification these days. A classified IxG would be great, though.

  4. Thank you for the great post! When a team overperforms with respect to xG, could it just be that their strikers are much better than the average striker so they convert low-xG shots at a higher rate than the average striker? And if this is true then we shouldn’t necessarily expect a regression to the mean.

  5. So Tim, regarding your final point, you’re saying that stats tell the whole story, right? You stats guys are all the same.

  6. Amazing podcast….this post went full maths which I personally liked it…the conclusion though was the best…keep doing what you are doing…do come in regularly in the pod to help fans understand stats..cheers

  7. Forgive me if I missed this, but is there an average season Xg mark that teams that win the league tend to hit? Not how much they overperform by. Kind of a benchmark for chance creation I guess.

    And then, based on past data, how much more would a team need to create, to still win the league despite underperforming on Xg.

    1. Are these stupid questions?

      (PS. Your site is doing the thing again where I have to fill in details every time to comment despite clicking on the option to save them)

      1. This always happens to me when I clear my web browsing history, so perhaps you’ve done that recently.

  8. To look at a specific example – The xG of Lacazette’s shot was 0.03 (or something close to that?). But, the aG was 1.0 So, an overperformance of 0.97

    Bravo Lacazette!

  9. The reason we are over performing our xG is because we get the ball to Lacazette so much quicker now and the team are playing to his strengths. We have curbed our tendency to over elaborate in the final third and are picking out runs better and catching defences out more often.

    Laca is one of the best finishers in the league and his decision making in 18 yard area is superb. If defenders give him an inch he shoots on sight while being sensible enough to tee up a team if the shot isn’t on.

    When it comes to the 2nd half ruthlessness, I have to hand it to Emery for being proactive and tweaking his tactics earlier than our opposition and going for the win.

  10. It was a very good listen. Andrew was very skilled in drawing the info out of you, and you were very good in breaking it down simply. Not afraid to say that I wasn’t fully across the xG thing. It’s all clear now, so double thanks, Tim, for both the pod and the writeup above.

    Seems to me that the more players you have capable of outperforming xG that the greater your chances of competing for the title. xG, in a sense, is having people who can create something out of a very, very low yield situation as Laca (2nd goal) and Ramsey did against Fulham. That correct?

    Oh, one more thing. The way people sound is rarely consistent with how they look, or the mental picture you have of them. I’d have bet the farm on you sounding like James Early Jones or Sam Elliott 🙂

    I jest, only slightly.

  11. The low point of the late-Wenger years was watching the team tootle the ball around the box in the ‘dreaded D shape’… one speed, no movement, no surprises, and no magical Messi type to panic defenses in tight spaces… Shooting (accurately and earlier) from distance seems to be part of making the Arsenal attack less predictable and up-tempo.

Comments are closed.

Related articles