Controlling the Midfield (and why James Milner might not be the answer for Liverpool)

Each sport has it’s truisms about where the core of winning teams come from. In baseball it’s “up the middle”, the notion that if you get your defensive players in the middle of the field right, it’s easier to fill the rest. In football, games are won and lost “in the trenches” where the unappreciated lineman clear holes for skill players to score touchdowns. In soccer, it’s the midfield or in one of the more delightful sporting cliches: the engine room. Great forwards will not score goals without a solid midfield to move the ball up and give them plenty of touches. A top class back four can’t hold out for 90 minutes repeatedly if they have to constantly defend against passes coming toward their goal. These are widely accepted truisms but it is pretty hard to look at stats to determine which engine rooms are running at top speed and which are bogged down. Hopefully this is a step toward determining that.

First, we need to define where the midfield is. This is how I defined it, between 38 and 79 yards away from the goal in my advanced soccer graphics representation program. This was just my decision based on what looked right and there are probably other ways of defining it that might be more correct.
Next we want to determine what stats to use to determine whether a team is dominating the midfield. Number of completions for and against shows possession but we need more, completion percentage is nice but rewards simple, short passes back and forth just inside the area equally with incisive balls through the middle. In the end, I came up with 4 factors to measure a team’s midfield control.

The first three are simple. One: completions per game. Two: the share of passes that are going backwards, mainly for context. Three: how far the average pass travels.

 

The fourth is a little more complicated. It is basically adjusted completion % on forward passes. To measure which teams were actually best as moving the ball through the midfield, I created a rough model for how an average team passes. It takes into account how far from goal the origin of the pass is and how much closer to goal the ball goes. I did this separately for La Liga, the Bundesliga, and the EPL. For example, in the EPL a pass that originates 63 yards from goal and is targeted at a player 4 yards closer to goal (59 yards from goal)*, is expected to be completed 86% of the time. If a passer is 40 yards from goal and tries to play a ball 26 yards closer to goal (in the box 14 yards from goal) it is expected to be completed 20% of the time. Obviously there are big changes depending on pressure and number of options available: a striker playing a ball forward will have a lower % than a midfielder or a defender simply due to how the team is laid out. This is ok, especially at the team level, as we are simply using this to measure which teams are actually passing well and which teams might be inflating their completion percentage through short passes far from goal. We add up each passes expected completion percentage then compare how many passes were actually completed to see if a team is above or below what you would expect.

 

*from goal is measured directly from goal. So a pass completed to the corner would be measured as 30+ yards from goal, not 0 even though it might be completed on the end line.

To visualize all of these factors, we go to Tableau and look at the 3 biggest leagues graphed:


Clickable link for interaction

Far on the left side of the grap we see Crystal Palace, Burnley and Eibar. These are the three teams who completed a lot fewer passes than you’d expect an average team in their leagues to complete. They were only about 89% as likely to complete any given pass as the normal team was. Moving from left to right we see teams like Newcastle, Atletico Madrid, and Mainz around the average line when it comes to pass completion quality. Far on the right, we see the expected big boys in Bayern, Barcelona, and Real Madrid. Gladbach, Everton, and both Manchester teams sit significantly behind those 3 in the second tier of this pass rating.

Looking at the bottom of the graph we see Man City and Arsenal in a group of their own when it comes to playing short passes. Up top we see 3 German teams play the longest passes, with varying rates of success. Paderborn, Mainz and Wolfsburg average midfield pass is over 5 yards longer than Man City.

Looking at the size of the bubbles, we see unsurpisingly that the best teams at completing passes are generally the ones who complete the most. One place we can see a contrast is between Tottenham and Atletico Madrid, who play similar short passes at similar success rates but the difference comes when we see Spurs play complete almost 40 more passes per game in the midfield.

The share of passes that go backwards is the color of the bubble. We see that Swansea and Manchester United are teams in the right half who play backwards passes more than anyone else, in fact Manchester United play the highest share of midfield backwards passes of any team on this chart. This is rare for a top team as you can see, and indicates a lack of forward options, a lack of aggression, or a tactic obsessed with keeping the ball.

 

 

Here is the defensive chart with a clickable link for more interaction:
https://public.tableau.com/profile/hogtrough#!/vizhome/midd/Sheet1

 

We see two massive outliers immediately. One is Leverkusen, who were just enormously harder to get through the midfield against than anyone else. The other is the infintesimal dot representing Bayern. Teams complete 40 more passes per game in the midfield against Man City than they do vs Bayern. Two interesting teams to contrast are Manchester United and Rayo Vallecano. They see the same amount of passes, are both very good at stopping passes and allow a little above average pass distance. The main difference is teams play forward a ton vs Rayo (because they press extremely high) while opponents play backwards a high amount against United.

Still the single most interesting part of this graph is Real Madrid. Teams play extremely short passes while completing more than you would expect. This was not something I picked up on while watching and something that is hard to explain away as a tactical decision in a league where they are simply so much better than many of their opponents. Something was wrong with Real’s defensive midfield last season, and that looks to be a pretty big hole going forward for a team with UCL and La Liga ambitions.

Chelsea are somewhat close to Madrid, down by Swansea. This is more easily explained as a tactical decision as we know from my previous piece on converting shots to passes that Chelsea are one of the best at keeping teams at arms length or on the edge of the attacking area, and one of the best at keeping passes from being converted into shots.

The longest passes allowed are generally all German teams (see below for more on league differences) and then some bad Spanish teams and then Tottenham, who are right besides Augsburg. Only Man United and strangely QPR are better at stopping passes through the midfield than Tottenham, the main problem with their defense was the passes that get through are long and dangerous, and are converted into shots at a higher rate than any other EPL team. This would suggest at first glance that the backline is more of a problem than the midfield. United had similar problems, though they were tougher to pass against and not near as susceptible to passes being converted to shots.

 

 

Combining shot conversion and midfield control

We saw how Chelsea’s unimpressive defensive midfield numbers were overcome by the sterling job they do stopping deep passes from being turned into shots, let’s see if there are other interesting separations.

There are obvious tactical reasons for some of these (Gladbach’s shelling, Celta/Rayo’s high presses) but there are some general conclusions we can make. If my team was in the second group, I would look first to upgrade my back-line if I wanted to improve my defense.

 

 

Combining offense and defense for total control of the midfield
To see which teams really control the midfield as the title mentioned we will combine the offensive and defensive metrics. The ratio of completions/completions allowed and the amount pass ratings on offense and defense are combined for one ranking.

Top 10
1. Bayern Munich
2. Barcelona
3. Dortmund
4. Manchester United
5. Real Madrid
6. Manchester City
7. Celta Vigo
8. Arsenal
9. Liverpool
10. Tottenham

Real Madrid’s poor defensive showing is outweighed by its dominant offense. The rankings give some weight to the idea that a good midfield will build you a good team. One interesting team not in the top 10 is Chelsea, who were 15th overall but still won the league without a dominant midfield.

 

 

 

Looking at individual teams

When you see a team rank high or low, the next question becomes why are they so high? What players are dominating the midfield for them? While this is still a very hard question that I am in no way certain of answering, looking deeper at this kind of passing data can help tell us a little bit.

We will look quickly at Man City and Liverpool, two teams who were both easily above average in number of passes and pass rating (completed passes compared to “expected” completions). We won’t look at defenders (though I will mention Mamadou Sakho was nearly off the charts in how well he advanced the ball aggressively) or forwards (where the differences between Eden Dzeko, Stevan Jovetic and Aguero are very noticeable) but will focus only on midfielders for now.

The midfield pass rate is basically how well the player is doing at completing passes that move their team toward goal in the midfield. A rating of 1 means they do exactly as well as an average EPL player, as you can see everyone here has a rating above 1, except for Milner who is 6 points below the average EPL player when it comes to completing these passes. His role was obviously much different at City than it will be at Liverpool, but the number remains a big worry for Liverpool fans. He is now being featured in an area where he really struggled to move the ball last season. When you factor in every pass over the whole field (overall pass rate), we can see Milner rises above average indicating he was at his best in the final third. His volume of work will drop there and rise in the center of the pitch in the upcoming season. Of course, more than half of the game is missing here but defensive work will come in another time, another article.

Other interesting player notes: Jordan Ibe’s high rating in limited minutes bodes well for his future and it’s another reminder of how silly good Yaya Toure and David Silva are. Liverpool as a whole saw their pass ratings drop the further upfield they got, no surprise to Liverpool fans who watched as they played an extremely conservative style for most of 2015, committing very few players forward. The limited attacking options made it very hard to pass, which will make it interesting to check in on Sterling at Man City and Coutinho with more options to see if they raise their ratings.

 

This is a broad overview of midfields, there are probably 20 articles to be written simply on Liverpool alone and there are tons of ways of looking deeper (who is forcing teams to play through the edges, hint: Villarreal, looking at game-by-game throughout the season and wondering why Liverpool had such poor midfield numbers vs Tottenham and great vs Chelsea while awful at home vs City and great away, etc) but hopefully you enjoyed this start. Any questions, comments, criticisms, etc feel free to reach me on twitter @Saturdayoncouch or post in the relatively new comment box below and I will be glad to discuss. Spammers, if you have read this far I am all set on sunglasses so please do not post.

 

 

Postscript comparing leagues

I promised a breakdown between leagues, but ran out of time. Here is a quick graph comparing completion percentages for different length passes. The Bundesliga is noticeably harder to complete passes. La Liga tends to see more short passes and Bundesliga: more long passes. Another time, maybe we can expand but there’s never enough time, right?

optalogo

Converting Dangerous Passing into Shots

When watching the Milan-Torino film for my last piece, the idea came to me to look deeper into dangerous area passes. Torino had put the ball into dangerous spots a lot in the first half but a check of the shot map made their half look harmless when it had actually been anything but. That led to a lot of research and then into this post, where I’ll look at how often passes into dangerous areas are converted into shots, the difference in assisted and unassisted shots, what teams do this well and poorly, and ask what we can do with all this new data. Hopefully you will find this as fascinating as I did.

 

 

What are dangerous, or Very Deep passes?

Passes played very deep into the opponents area.

Okay, smartass, what is this area you are defining as Very Deep?

The area roughly covered by blue lines here, within 15 yards of the opponents goal

 

Okay, what is so special about this area, I am tired of learning about new areas and terms why should I put this one in my memory bank?

To ease the understanding of the rest of the article. There is no special reason I chose exactly 15 other than it’s a clear number. Teams converted 18% of shots from this area, compared to 6% in the 15-25 yard range but that is likely true for 14 or 16, but I chose 15. Very Deep is also easier to say that 0-15 yards of opposing goal every time.

 

How often do teams pass into this area?

I see where you are headed. To pre-empt the next few questions, here is a general table:

If you complete a pass in this area and get a shot off then unsurprisingly it is a golden chance. It follows that turning more of these passes into shots one of the best ways to improve your offense (or vice versa for defense).

 

A few quick best and worst lists to put down some context.

 

Offense:

and defense:

First, to explain the difference in the two right-most columns. Assisted shots/completion is how often a completed pass turns into a shot. Total shots/pass is all shots (assisted and unassisted divided by total passes into the area).

A couple interesting observations: Manchester United’s soft underbelly shows up here, nearly the easiest to complete passes against in this area and then allow nearly the highest shots/pass of any team. Those are simply unacceptably bad rates for a team with their payroll. Dortmund allow the fewest completions per game, yet when they are completed they were converted into shots at the 4th highest rate. This makes me think there is very high pressure on the players attempting the pass and maybe higher risk defending on the recipient.

 

Ok, so where do passes into the very deep area generally come from?

 

(sides have been equalized to hopefully increase comprehension, differences not significant between sides, the huge box with 13 in it is the entire own half, opposition half is the rest broken down)

We see that passes into the Very Deep area come primarily from the sides, in a season an average team will play 350 passes from the corners of the pitch into this area (not counting actual corners).

 

All Very Deep passes aren’t created equally are they?

 

No they are not. Here is how often they are converted directly into assisted shots:

 

So we see that as we expected it’s better to play a shorter pass from the middle of the field into the area than to hoof one from out wide. Only 1 out of 29 passes from the corners turns into a assisted shot while 1 out of 8 from the middle outside the box turn into assisted shots.

 

That’s a European average though, does it mask differences from league to league?

 

As you’d suspect looking at the leaders charts above, it absolutely does. Let’s look at the Bundesliga vs Ligue 1:

 

We see that long passes from the sides of the pitch are turned into shots over two times as often in Germany as they are in France. This variation in styles between leagues (which also shows up in shots) is one of the more interesting questions in football data. Teams in the Bundesliga press out a lot more than in Ligue 1, leading to some of the differences we see here and causing pesky problems for anyone trying to build any sort of expected goals (or, as I briefly and madly considered, expected shots *shudders*) model that covers more than one league. Pressure is the missing component I assume, as there are simply fewer defenders covering pass recipients in the Bundesliga than there are in France but until we can account for that, differences in playing styles across teams leagues will continue to trouble global model-makers, in which I semi-proudly claim membership.

 

 

A few interesting team maps before we move on, first Barcelona. Barcelona attempts passes from the deep sides of the pitch near the halfway line less than any other team in Europe. They tried only 6 passes from there, understandably working the ball through the center.

 

Mainz on the other hand tried 47 passes from the sides near the halfway line:

 

Dortmund rarely allowed any dangerous passes, though as we saw the very few that were completed caused major damage.

 

while Swansea allowed 139 passes that originated inside the box (which led Europe):

 

 

 

 

I know you just disregarded an Expected Shots model but let’s see it in action anyway.

If you insist, and at the least it leads us to some interesting conclusions even if it is simply an assisted shots model. It is a very simple setup: if a pass came from the center square it is given a 12.5% chance of leading to an assisted shot, from the corners: 3.5%, etc, etc. You can look back a few graphs up at the % of Very Deep passes leading directly to shots graph to see the rest. Looking at actual shots compared to “expected” shots will tell us who was the best offense at turning dangerous pass attempts into very dangerous shot attempts and who was the best defense at defusing their box being bombarded by limiting the number of shots.

 

As expected we see a lot of German teams are great at converting their passes into shots above the rate we would expect based on where those passes came from, starting with Wolfsburg who took 32 more assisted, close-range shots than we would expect from their pass numbers:

 

So if Athletic could convert passes into shots like Real Madrid, they would have added 50 more assisted shots to their total. If they converted those at the European average rate of 40%, they would have scored 20 more goals and possibly threatened for Champions League. I think that this means Athletic’s attacking line is holding back a team with the potential to easily score 50+ goals. They have enough dangerous possession to be doing much better, so if looking for an offensive upgrade I’d look toward those involved in the final ball as the players behind them are being let down by those in front.

 

Looking at defenses I will split it up by league so it’s not just a list of German teams in the top 10 of defenses who allow a high rate of shot conversion (number is “extra” shots allowed above or below what would be expected from pass totals and origin):

Here we see possibly why Sunderland stayed up this season. They are a surprising name to see on the left side of this table and facing 12 fewer high quality chances than expected was a big boost toward gaining the 3 key points between survival and the Championship.

 

We can look deeper into each league with the following images and interactive links. In Germany high on the y-axis Bayern and Dortmund are the teams well ahead of the pack in sending passes into the box. Gladbach’s extreme efficiency shows up as they are near the bottom of Very Deep passes attempted, but they are best in the league in converting passes into shots. The dark color of their circle shows it’s not just passes from high quality areas, they are simply converting passes into shots at a rate above anyone else in the league. The size of the circle shows they are converting those shots into goals atop the league as well.

 

 

 

 

 

Bundesliga Offense

 

In the EPL we have more uniform conversion rates outside of Man City.

EPL Offense

 

In La Liga we see massive spreads in conversion rate and very deep passes per game. This wide spread has always made this league the toughest to model correctly.

La Liga Offense

 

An interesting case is Serie A’s defenses. 6 teams allow essentially the same amount of passes into the box but we see the difference in Juve, Lazio and Roma (the top 3 teams in the table and fewest goals allowed) is they turn those passes into fewer shots allowed than Inter, Fiorentina and Napoli.

Serie A Defense

 

 

 

Looking one step back

What about the yellow area (15-25 yards out from goal)? It’s not near as dangerous: only 22% of completions turn into assisted shots and 10% of assisted shots are turned into goals (Unassisted shots clock in at 3%). There are still interesting things to learn looking at this area. I looked at the ratio of completions in the yellow area (deep) to the blue area (very deep), thinking teams that build a shell around their goal might keep the action at arms length.

 

 

We see a familiar face at the top in Chelsea. Mourinho’s wall around the goal is a familiar, frustrating sight for EPL teams and it’s borne out here. You can get close, but not close enough to get a really high quality shot.

 

At the other end Leverkusen tries to not let teams get close to their goal, but when they do they don’t slow down on the doorstep they barrel down on goal. Teams who have a high completion ratio (like Chelsea and Everton) are also generally better at keeping their opponents from converting passes to shots. On offense that relationship does not hold, though Chelsea and Everton are again near the top.

 

The offensive top 10s look like this:

 

 

 

What are the uses?

These metrics can help evaluate where your team’s attack or defense is being let down. We used Dortmund’s defense earlier and identified the final link as the weakest: they are allowing an extremely low amount of very deep passes and a low completion rate, but those completions turn into shots way too often. If they can get that rate down to even European average, that could be 35 fewer shots they would face. It gives us a glance at where teams are entering the ball to dangerous spots and how well they are converting those passes.

 

What is missing and what is the next step?

The obvious next step is looking at which of these metrics are repeatable and which are most influenced by luck. Right now, I have no statistical backing to say which is which but I am pretty confident that simply based on larger sample size, pass-to-shot conversion rates are much more indicative of true skill than shot-to-goal conversion rates.

 

Applying these metrics to individual players would be an interesting use. If Leighton Baines crosses are converted into shots at a significantly higher level than everyone else that is useful information. Same with strikers, if Aubemayang is converting passes into shots at an elevated rate while Ramos and Immobile are not, then he might be the reason and not the supply. Defenders can be looked at as well, though as usual that comes with a lot of complexity.

 

 

Problems with these metrics are the difficulty in classifying unassisted shots. Some of these likely were assisted from passes just outside the area I selected as a cutoff but it is hard to determine which ones come from dribbles, loose balls, rebounds, etc. That would add useful information. It almost doesn’t need to be stated at this point, but pressure would totally change how we look at metrics like these and shots. Without a huge manual project, it’s unlikely we will have Europe-wide pressure stats anytime soon.

Of course, the ultimate end point is shots to goals but I felt the piece was already long enough and I don’t have as much new to add in that area. There has already been plenty of good work (like this piece from Michael Caley) that has established a lot in that area. Classifying % of shots taken as headers from each area would be another improvement I could make.

Thanks for sticking with this piece through the stats and the tables. Hopefully you enjoyed it or it sparked an idea for you. If you have comments, critiques, ideas, or anything else you can reach me @SaturdayonCouch on twitter or you can post a comment on my website.

 

Thanks for reading StatsBomb!

PoweredbyOpta

 

Match Day Analysis: going granular to try and gain the extra 2%

 

This post will be diving into the granularity of a single match. I think that single match stats, trends and data can get overlooked at times in the soccer analytics community mainly because goals, expected goals, saves, and shots need a long time to stabilize on a team level and especially on an individual level. Those numbers aren’t very useful planning for a single match, which is why we will turn to ball movement stats for this article. Soccer may be a “thin data” sport when it comes to actually scoring, but it’s an extremely data-rich sport elsewhere. One match delivers more information in soccer than in any other sport because of how many individual actions must be made. This piece is how an analyst can hopefully use some of that information to help his team prepare and adjust in a single game. The goal is not to analyze one Milan-Torino match and be done, but to hopefully illuminate how stats and analysis can be of great help to teams in on a time-frame of minutes and days not just months and years.

 

Before any of this analysis becomes useful for a team, analysts at clubs have to have good relationships with the coaching staff and players. They could be creating impeccable game plans but if it’s dry, technical and they have no relationship with the playing staff it will not get implemented. The Pittsburgh Pirates found real success at blending analytics into the time by having a analytics staffer travel with the team, sit in on every pre-game meeting with the coaching staff, and spend more time in the clubhouse. This led to great two-way communication where experienced baseball coaches could get to know the analytics staff and begin to trust them more and players and staff could ask questions and led the analysts to places that would never have occurred to them without this interaction. This relationship often means presenting your findings in a different way than you might at an academic conference:

 

“They had to democratize the data and turn it into something that not only stat wonks understood, but athletes, too. Fox and Fitzgerald knew they might lose players if they just passed along numerical data. They had learned in their limited conversations with players that they absorbed visual materials amazingly fast and retained the information.”

 

-from the recommended book Big Data Baseball.

 

 

 

The need for communication is key in soccer as well. Carles Planchart is Bayern’s head analyst and he puts it this way: “The most efficient method is to show them visual images, because that gets the idea across very quickly.” Planchart discussed his and Pep’s halftime routine in the book Pep Confidential:

 

“He usually picks three or four concepts to cover at half-time and will use two or three three-second videos to demonstrate what he means about each one. In total, there will be rapid shots of roughly 10 specific moves. ‘What are we doing at half-time? That’ll be Pep’s first question. He comes into his office and asks, “what are you seeing up there?” Because you get a completely different view of the action from up above and you spot different things. He always listens attentively when I make my report.”

 

 

For soccer analysts at a club, detailed single-match work like this is where most of your time and impact should be. Even turning 2% of losses into draws and 2% of draws into wins could make a big difference in the table and the clubs finances in a way that I don’t think is as easy to do through analysis in the transfer market at this point.

 

In this example I am hypothetically working as one of the defensive analysts for AC Milan as we prepare for our match against Torino in Serie A. I say “one of the defensive analysts” because any large team should have way more staff than they seem to now. The Pirates have 7 people on an analytics staff with revenues of $229 million, according to Forbes. AC Milan have revenues of $339 million and play in a sport that is probably 25 times harder to analyze and 25 years behind in terms of the statistical work done. If the Pirates can hire 5 full-time guys and 2 part-timers just for analytics, there’s no reason AC Milan can’t have a dozen analysts working for them. Anyway, for this piece I’m imagining I am one of those guys.

Build-Up

While some of my partners review our previous match, it falls to me to prepare a basic scouting report on the upcoming attack. I start with a Style Profile featuring their offense and our defense:

 

The two main characteristics our coaches need to know are they are off the charts at playing centrally in the final third and playing deep in their own half. They can then decide if they want to press high to attempt to kickstart the offense or sit back and avoid being stretched as Torino coach Giampiero Ventura always tries to do (quote from his thesis: “possession of the ball aims to attract the opposing players in one area of ​​the field to take advantage of the spaces that are created in other areas”).

Next up, we want to see how Torino plays against teams similar to our defense. We know from our previous analysis on similarity scores that our defense plays similarly to Atalanta, Sampdoria, and Verona and is closely related to Cagliaro, Chievo, and Palermo. Stuffing down our disgust that mighty AC Milan is grouped in with these teams, we do some analysis on how Torino approached those teams, especially when playing away.

 

We see they pass even deeper in their own half (vertical line represents about midfield)

and pass even more centrally

 

This just drives home the point that we need a coherent plan on how we will try to defend their build-up play. Our deepest midfielders can be used to force play wide or to try and break up the moves by pushing higher with our attackers.  We aren’t an accomplished pressing team so this might be given high priority in training. We want to make teams go away from what they want to do.

 

 

The major differences vs similar defenses: lower shot tempo, longer passes into the box, and lower share of their passes in the “red zone” (within 25 yards of the opposing goal).

From this we can conclude they will not be flooding the box with numbers. This likely ties in with their defensive strategy, which is similar to Gladbach’s (written about here) as they sit well back (4th percentile press) and stop passes close to the goal. They will pick their spots to attack and will use Omar El Kaddouri to spearhead them.

 

Against Chievo in their last game, he completed 18 passes to their two forwards, all of them coming in the center of the pitch. No other player completed more than 9 passes to forwards and they only completed 1 pass between themselves. Here is an example of where he was feeding the forwards

 

and here is how he was supplied, map via analyst Tom Worville

 

A player who is not going to start the match will be briefed on how El Kaddouri functions and simulate him throughout the week in training for our starters to get to know.

 

Some of the rest of the staff will be working on building shorter versions of videos like this one:

2-3 minutes of build-up from their most recent games and from games vs similar defenses and a 2-3 minutes from each game showcasing their attacking moves. If our manager says he wants to see Torino against other styles as he is considering a tactical change, we can compile reports and videos on those as well.

 

 

Summary

-they will play from deep

-will target center of final third

-will use El Kaddouri to get ball there

-won’t commit big numbers to attacks

 

 

 

Match Day

All of our systems are hooked up and churning and the defensive staff is prepared and working during the first 45 minutes. Note-worthy moments are clipped and sent to the coaches computers instantly so they can view them at halftime. A box score, maps and any other interesting things we can show them will be collected and sent down in a package that can be digested in a few minutes.

 

We are winning 1-0 at the break but there are worrying signs on the pitch. They have not generated a lot of shots against us (0.34 raw expG) but the ball has gotten to dangerous positions too often and our right back was sent off during one of those.

 

The basic box score looks like this:

The highlighted areas are the things we think stick out, with a few comments below.

 

We also send down our pressure map:

This is assuming they have a video system that can track the ball and players at the same time. If they do not, this is prime intern duty.

We attach a second map with key players we want to point out, as the entire map can be overwhelming.

 

 

We want to emphasize that De Jong and Van Ginkel seem to be overlapping on the left side and don’t show up in the right-sided center of the pitch when Torino are attacking through there very often. Poli is roaming all over the place and Zaccardo is completely nonexistent. Torino pass breakdown is 66/105/71 from left to right so it’s not their offense that explains all our pressure lining up on the left. Cross-checking with our offensive touch map will likely show our offense is bogging down there with too many people in the same area, leaving openings on defense when we do lose the ball.

 

We include their offensive average on the ball positions with the shading showing how much pressure they were usually under in the first half.

 

We include their forward movement/attacking passing map (thick end is end of pass). Blue is complete, red incomplete.

 

 

We include their key build-up combinations and where they take place on average. These are the most common forward passes made by Torino. We have a clickable option to expand on each of the buildup and see what happened with the ensuing possession.

 

If you wanted to see more info about the 5 passes in the Moretti-El Kaddouri combo all the passes are described and video linked so you can click on time and the video plays from there:

5:16 (starts a move that winds up with Poli intercepting a pass just outside the Milan box)

18:44 (move broken up as De Jong takes ball away in Torino half, directly leads to contested shot on break by De Jong)

30:39 (starts break that leads to speculative Martinez long shot)

31:22 (Honda tackles El Kaddouri quickly takes ball, break looks semi-promising but fizzles out)

43:47 (El Kaddouri finds Amauri on edge of 6-yard box, who plays dangerous ball across goalmouth).

 

 

 

The final map is one of dangerous dribbles. El Kaddouri is in blue, Molinaro is in orange. Thickness is how many defenders they bypass. Milan goal on left, thin end is end of dribble. We see the long dribbles coming through the middle, weighted toward the right side.

 

Finally we want to link to some key moves in the first half.

 

Too many players getting up the left side and De Jong also on the left behind Van Ginkel. Space opens for a El Kaddouri driving dribble

Another when the middle of the field was too open allowing Molinaro to dribble in

 

Here is the press working well with 6 players involved leading to a chance for Milan

and again the press working

 

an example of one man (Bochetti) being late and the press breaking down and space opening up

Press cut down the middle

Press, not enough people commit, cut down middle and leads to red card

 

 

Ideally all of this and the offensive material can be scanned over in the first 5-6 minutes of halftime which then allows the manager 10 minutes to make his adjustments.

 

Our suggestions

-get the press to go together or not at all, half-hearted press leaves middle wide open

-Van Ginkel and De Jong are overlapping too much defensively, leaving the right side open. Zaccardo is not getting enough done on the right side. They are trying to get space for dribbles and quick passes to the middle of the pitch and we are almost inviting it right now.

 

 

 

Conclusions

This kind of analysis doesn’t lead to quick, sweeping conclusions about teams or players but is the nitty-gritty of what club analysts should be (and probably most are) doing leading up to and during a game. The extra 2% is what stops attacks, turns losses into draws, and turns 7th place into 5th place. Hopefully you enjoyed this look and I’d love to hear your thoughts on what you think analysts should be doing leading up to a game and during a game on twitter @SaturdayonCouch or at my website  where you can comment on this article.

 

 

 

 

A Family Tree of European Offenses

 

In my previous two posts on StatsBomb, I have used passing data to create team profiles and find similarity scores and then built on that to create a family tree relating how teams defend all across Europe. Today brings the offensive side of the ball to the forefront. You can read the full process of how these metrics are created and related in the previous two pieces but I will give a quick run-through here.

 

The metrics used are:

Shot tempo: shots per pass (3 highest tempos: QPR, Leverkusen, Crystal Palace. Lowest: PSG, Bayern, Manchester United)

Box activity: how often a team passes into the box per game (3 highest: Bayern, Dortmund, Man City. Lowest: Nantes, Bastia, Cordoba)

Intra-box success rate: completion % of passes that start and end inside the box (3 highest: Man City, Lyon, Bordeaux. Lowest: Hertha, Koln, Athletic Bilbao)

Centrality: % of completions in middle of pitch when in final third (3 highest: Torino, Dortmund, Hoffenheim. Lowest: Real Sociedad, Atletico Madrid, Levante)

Possession: share of possession (3 highest: Bayern, Barcelona, PSG. Lowest: Palace, Hertha, Eibar)

Forward play: % of completions that are forward (3 highest: Paderborn, Marseille, Hoffenheim. Lowest: Manchester United, Roma, Manchester City)

Field tilt: how far up the pitch the average pass is completed at (Highest: Man City, Barcelona, Chelsea. Lowest: Augsburg, Torino, Rennes)

 

 

 

and a few new ones for offense

instead of simple long ball%, two metrics replace it

Penalty box entry length: the average distance of a pass in which a team enters the box (Shortest: Barcelona, Arsenal, Manchester City. Longest: Eibar, Evian, Levante)

Playout length: average length of completions from deep in own half (shortest: PSG, Cagliari, Inter. Longest: Burnley, Eibar, QPR)

 

also new are

Red Zone%: % of passes that are completed within 20 yards of the opposition goal (highest: Man City, Leverkusen, Burnley. Lowest: Cordoba, Elche, Levante)

Diagonals: added thanks to this treatise from Adin Osmanbasic it is a measure of what % of a teams passes are long and diagonal. (highest: Bayern, Lyon, Lazio. Lowest: Palace, Palermo, Sunderland)

 

Are these the best metrics for judging a style of play? Almost certainly not, that will be a long process full of tweaking and testing. Right now, I feel satisfied this gives us good groupings as the variables are generally measuring different things (none correlate above .5 with each other) and are measuring some reasonably distinctive part of the game. I’ve weighted some metrics more (shot tempo, box attacks, possession, intra-box success rate) and some less (diagonals, forward play) before running these analyses.

 

The first analysis was a k-means cluster analysis using those metrics to group similar teams together. “K” is number of groups and there is always a debate as to how many you should choose. I ran analysis with k ranging from 12 to 35 and then looked at how much variance was explained by each one. 20 was where the variance seemed to stop decreasing consistently and since I used 20 groups in my defensive piece, I was happy to go with 20 again. If you choose a different k, you will get teams shuffled around a bit within groups as obviously certain teams are barely part of one group and could be moved to another without much concern. Once they had been grouped I ran an agglomerative hierarchal clustering on those group metrics to create a tree graph relating all the groups of teams across Europe. The tree graph as a whole is below, then I will go through each branch for a quick overview.

 

 

 

 

We will start with the 5 groups at the top.

 

From top to bottom:

Group 6: Lazio, Sampdoria, Torino

is closely related to

Group 5: Genoa, Frankfurt, Newcastle, Parma, Malaga, Villarreal

 

We start out with an enormous yawn. A bunch of solidly midtable teams along with Newcastle and Parma (whose defenses were historically awful, offenses weren’t near as bad). These teams have few standout characteristics either way. They do have well above average shot tempos and are good at passing inside the box. The main difference is Group 6 plays extremely centrally and play a lot of diagonal balls while Group 5 plays higher up the pitch and spends a high % of their time in the “red zone” (within approximately 25 yards of goal).

 

Group 8: Palermo, Atletico Madrid

Don’t play in the center and essentially never play diagonal passes. Have a very high field tilt, yet are well below average at time spent in the red zone and box attacks. I do wonder which metrics are mainly manager-related and which are player-related and if it’s even possible to separate them satisfactorily. If you have Messi and Neymar, you will hit a lot of diagonals and have a great intra-box passing % no matter what, even if they play for Diego Simeone right? I tend to think Atletico rarely play diagonal passes because so many attacks go through the wings and where there is only one way to play a diagonal ball and that’s into the teeth of the defense. The fact Atletico don’t commit many men to attack and seem to set up defense first means there will be less options to hit across the field. Hopefully a piece on variance of styles throughout the season can help find manager effect.

Group 3: Cesena, Hull City, Guingamp, Metz, Montpellier, Almeria

and

Group 1: Atalanta, Chievo, Burnley, Palace, Leicester, QPR, West Ham, Bremen

 

These teams rarely play with the ball and play it long repeatedly. Group 1’s shot tempo is by some distance the highest of any group, and they generally spend a lot of time in the red zone and are above average at putting balls into the box. Group 3 has neither of those last two positive metrics, providing the difference.

 

West Ham at some point this season were being mentioned as a team that might break into the top 5 and were good enough to qualify for Europe (I guess they did, but I doubt many of those pundits were eyeing the Fair Play Table at the time). They wound up as a pretty poor team playing similarly to a lot of other poor teams.

 

 

 

 

 

Group 13:  Stoke, Toulouse, Mainz, Augsburg, Hamburg, Freiburg, Stuttgart

and

Group 9: Sassuolo, Udinese, Verona, Koln, Hertha, Paderborn

 

Here we find most of the bottom of the German table. We find teams in this group play normal length passes deep in their own half but play long balls into the box. They have low possession rates, low field tilt and very low intra-box success rates. Group 13 has the ball and tests the box a lot more than group 9.

Group 16: West Brom, Caen, Lens, Athletic, Espanyol, Real Sociedad

and

Group 14: Sunderland, Bastia, Cordoba, Deportivo, Eibar, Getafe, Granada, Levante

These teams are even worse at passing inside the box and couple it with rarely getting the ball to the box. They generally play long balls throughout the entire field, don’t play centrally, and don’t have a large share of the ball. Group 16 has a higher share of the ball, play shorter passes into the box, and have a significantly higher field tilt. Last year David Moyes was in the Champions League managing Manchester United against Bayern Munich and he ends this season lumped in with Caen and West Brom by some guy on StatsBomb. What a fall.

Group 17: Evian, Nantes, Reims, Elche

and

Group 15: Swansea, Bordeaux, Lille, Lorient, Nice, St Etienne, Valencia

Here we have the patient, pick your spots teams. These teams breach the box at very low rates, but break into the box using short passes (group 15 significantly shorter) and complete a very high rate of their intra-box passes. Group 15 sees more of the ball while Group 17 plays mainly through the wings.

 

Group 11: Aston Villa, Rennes

Low possession teams generally play directly and shoot quickly when they get the ball. They don’t have the quality to hold the ball and play intricately so seem to rush the ball up the pitch and fire. Aston Villa and Rennes do not:

They couple their low shot tempo with the longest average pass length of any group when entering the ball into the box and below average intra-box pass success rates. So they aren’t picking and choosing prime spots, but seem to simply have a lot of useless completions that don’t get them closer to the box or a shot. Not pretty.

Group 18: Marseille, Wolfsburg, Rayo Vallecano

An interesting group here with two high-pressing defenses in Marseille and Rayo. These teams have the lowest average field tilt of any group and a high rate of forward play. They play short passes at both ends of the field and tend to play through the wings in the attacking third. It’s a strange profile as they almost play counter-attack football with very high possession rates. I am guessing the fact the game has become very stretched in Marseille and Rayo’s case leads to many of these numbers. When the other team is wide open you can play forward and don’t spend a lot of time passing it around against a set box (which would raise your field tilt rating).

Group 7: Milan, Schalke

If nothing else comes from this, I think the grouping absolutely got this one right. On a gut level this just feels perfect. Two big-budget teams who performed absolutely dreadfully this season (barring maybe the most bizarre game of the season in Madrid). Slow, ponderous play that rarely gets the ball near the goal or tests the box does not make for good watching. For good measure they are atrocious at passing inside the box. At least they don’t hit a lot of long balls, right?

 

After this group there is a big gap. Look back up at the main tree and you will see there isn’t much similarity between groups 7 and 18 and then 20.

 

 

 

Group 20: Bayern, Barcelona, Celta Vigo

and

Group 4: Empoli, Inter, Roma, Spurs, Juventus

Now we start to get to the high possession, highly effective offenses clustered here at the bottom of the tree. Celta Vigo kind of stand out here and while they certainly don’t reach the heights of Bayern or Barca, they style themselves similarly. They are good inside the box, hold the ball at very high rates, attack the box a lot and play short passes to enter the box. When you combine this offensive style with the crazy Bielsa pressing tactics and taking impeccably named Chilean team O’Higgins to their first league title ever back in 2013, Celta manager Eduardo Berizzo should at least be taken a look at for bigger and better jobs in the coming years.

 

Inter and AC Milan finished near each other in the table and are linked together in my generally EPL-centric mind but actually played very differently (a high box entry pass length bar here actually refers to a short pass in what was an astoundingly poor design choice):

 

Another team who is interesting in how they profile with these metrics is Empoli. Their defense was mixed in with Fiorentina, PSG, and Manchester United and now their offense reaches high class company as well. Kind of strange for a team that won 8 games all year and finished 15th a year after promotion from Serie B. Without knowing much else about him except these profiles, I’d wager that Maurizio Sarri will be another manager to watch going forward. And as soon as I typed that sentence I scrolled down on his Wiki page to find out that he has been confirmed as the new Napoli manager. Another instance of the profiles running ahead of my knowledge. To get a team with no players on more than $300,000 yearly salary to play like this is quite an achievement.

Group 12: Everton, Manchester United, Lyon, Monaco, PSG, Hannover, Gladbach

 

An imaging error has led to the Gladbach logo being left off. Any Foals fans feeling left out, please go read my long investigation into the entirety of Gladbach  and get back to me. These teams have very slow developing attacks that don’t get up the field at high rates. They are very good at passing inside the box.

 

 

 

 

 

Group 10: Arsenal, Man City, Liverpool, Chelsea, Southampton

Saints and Liverpool just barely make this group, but it shows the serious stratification of the EPL once again. These teams pepper the box (Saints 62nd percentile, all others 78+) from central areas (all above 80th percentile), with short passes (all above 72nd percentile) and are great at completing passes once inside the box (each team above 80th percentile). Chelsea and especially Arsenal and City are near the top of Europe at all of these things but Liverpool and Saints are like the little brothers who are doing what their big brothers do, just not quite as well.

Group 2: Cagliari, Fiorentina, Napoli, Real Madrid

Only PSG played out of the back using shorter passes than Cagliari. They can take some solace in that stat and the fact their offense was grouped with these 3 teams next season as they play in Serie B. The main problem there was they allowed 68 goals and their defense was grouped with QPR, Burnley, and Chievo.

Group 19: Dortmund, Leverkusen, Sevilla, Hoffenheim

 

The crazy uncles who seem to have little relation to anyone else. Usually shot speed is correlated with possession, as the chart with Aston Villa and Rennes showed earlier. Here we see it flipped the other way: teams with high possession who still fire a lot of shots per pass.

 

This group also has a higher % of their passes in the “red zone” or the 25 yards or so within the goal of any group. They pepper the box more than anyone bar the Bayern/Barca group and play extremely centrally.

 

Going forward

I think there is great potential with this type of analysis, especially once we begin to drill down into style vs style or game to game analysis. Tom Worville thought it could be used for transfers for teams looking for flexible players or possibly players with experience playing the style they wanted to. I am not sure if I would feel comfortable basing player analyses on this broad, team-level data right not but certainly it could be good for a starting point. For example, if Sunderland is looking to fix their offense maybe they would study what Athletic Bilbao does differently. Since Bilbao does a lot of things similarly to Sunderland, the differences might be easier to reach than studying Arsenal or Barcelona. At the very least, a quick glance at these graphs can make anyone much more knowledgable about the game across Europe and then decide what to look at further from that. For example, I had no idea about Empoli or Celta Vigo’s style of play before a week or so ago. Now I will keep my eye on Maurizio Sarri and Eduardo Berizzo going forward despite having not watched more than 30 minutes of those two teams total in the previous year.

And again, this is a rough guideline. If you change any of the metrics or the number of clusters you would get slightly different results. Southampton and Liverpool were close to breaking off into a separate group from the big 3 English offenses, Swansea was close to joining group 17, Bremen and Hannover are only loosely attached to their groups and several more things might have changed. These groups are not set in stone at all.

 

Discussion

If you have any questions, criticisms, comments or want to discuss this further you can reach me on twitter @SaturdayonCouch or post a comment on my blog. I’d love to discuss.

 

 

A Family Tree of European Defenses

 

Last week on StatsBomb I wrote about profiling teams and finding similar teams through styles of play. In that piece I mentioned the next step might be grouping teams as a basis for comparing how certain styles of plays match up against each other. This is the first step toward that, as I’ve compiled a rough family tree of European defenses playing styles. First, a reminder of what metrics I’m using to determine style of play:

1. Possession

2. High press score (completion % allowed 60 yards+ from goal)

3. Shot Tempo (shots per completion)

4. Field Tilt (ratio of final third/own third completions allowed, higher means opposition spends more time upfield)

5. Box Activity (passes into box allowed per game)

6. Intra-box completion % (passes that start and end in box)

7. Forward play (% of passes that are forward)

8. Centrality (% of final third completions that are in the middle of pitch)

 

In the previous piece, I found similarity scores for entire teams. Now, I am focusing only on the defensive side of the ball. I couldn’t go team-by-team manually grouping teams and I couldn’t eyeball teams so to group the teams I used a k-means cluster analysis. All the metrics had been normalized, so no one variable dominated. The problem with k-means is you have to choose how many groups there are yourself. To choose how many groups to use, I ran a k-means analysis for each possible number of groups from 5 to 35 and found that 20 was right about the spot that the amount of variance stopped decreasing consistently. This is called the elbow test and while the results weren’t totally definitive, they fit in well with the general rule of thumb for determining # of groups which is the square root of the # of observations/2. All this to say, the 98 teams are divided into 20 groups.

 

Let’s look at some of the interesting groupings. First, group 4:

 

 

Something stands out here. You’ve got these teams with large profiles that you’ve seen late in European competitions recently and then United and Empoli, am I right? Seriously, I’ve never watched a full Empoli game so was surprised to see them pop up in this group. The defining characteristics of this group are very high levels of possession, a low defensive field tilt (opponents spend relatively more time in their own third than in attacking third), low box activity, high shot tempo allowed and are easy to complete the final pass against inside the box. It’s not always easy to get the ball upfield past these defenses, but once you do you have a good chance of a quality look at goal. Might call them the soft underbellies.

 

 

 

Group 8 is another interesting group to look at, I like to call them the nullifiers:

Napoli, Lille, St Etienne, Nantes, and Atletico Madrid have a little above average possession and an average press score as a group and then allow almost no central play (11th percentile), are very good inside the box (23rd), rarely have the box tested (11th), and allow an extremely low shot tempo (9th). This all adds up to under 10 shots allowed per game and barely over a goal per game allowed. I’d figure the managers of these teams are thinking first, second, third and fourth about nullifying the opposition when they set their teams out.

 

 

The final group of teams to look at is one near and dear to my heart, the Bielsa Disciples in Group 16:

 

I personally absolutely love watching the style that Marseille, Celta de Vigo, and Rayo Vallecano play. I say that somewhat hypothetically as I haven’t watched more than a half of Celta de Vigo and didn’t know much about them before I began this process. When I saw them popping up alongside Marseille I looked them up to make sure I wasn’t missing something. I found their coach Eduardo Berizzo is a Bielsa disciple who worked as an assistant to him for several years for Chile. The fact the algorithm knew it before I did is a good sign. These teams all have high possession numbers and have extremely high presses. They also allow a higher centrality than any other group, a higher shot tempo than any other group, and higher forward play than any other group. If you love to see crazy man-to-man marking up the pitch with a high risk of a long ball down the center of the pitch leading to a shot, these are the teams to watch. Rayo came into the Camp Nou and ruffled Barcelona for about a half late this season before being eventually blown away 6-1. Marseille led PSG through goal off a high takeaway before being cut open 3 times in quick succession. This group provides exciting, physical games with lots and lots of open space.

 

 

 

There is a lot more interesting stuff in all the individual groups, but more on another day. Each group and their central metrics will be at the bottom of the article.

 

Building the tree

I had a bunch of groups or families of teams but I wanted to know which were related. I knew the Bielsa Disciples group wasn’t going to be close to a group with Aston Villa, but how close were they to the soft underbelly of United? I wanted to know. So with the 20 groups average metrics replacing the individual teams metrics, I ran another type of cluster analysis called agglomerative hierarchical clustering. It basically uses similarity scores among groups to build a family tree showing who is related and by how much until all the groups are connected. I will post the tree with the group names (which mean very little to you), and then go through an example involving a team. Here is what the tree looks like:

 

 

 

We will be using a team from group 7 to follow up the tree. Group 7 teams are distinctive for a very low shot tempo allowed, low box activity, and high possession. This is a common defensive profile among good Italian and English teams (why that is is a fascinating question but well beyond the scope of this article). The profile and members are below:

 

So if you are Liverpool (our example), the teams who defended most similarly to you this year were Saints, Juventus, Chelsea, Arsenal, Lazio and Man City. These are your siblings on the defensive family tree.

 

We can see on the tree that this group of teams are very closely related to group 20 mainly due to the fact that group 20 teams have a very low box activity and very low shot tempo against as well. Group 20 is an all-Spanish affair:

 

On the family tree, these are the cousins/primos who you see at the holidays if you are Liverpool. The main differences between group 7 and 20 are the Spanish sides have lower possession, higher pressing, have a much lower centrality percentage, and are a bit easier to pass against inside the box.

 

You can see that playing a similar style doesn’t guarantee similar results as Levante and Athletic’s style’s led to a 25 goals allowed difference, the huge possession difference a big factor there.

 

The next step up the tree we see groups 13, 4, and 10. These are the relatives you might see at a family reunion every few years. We already saw group 4 above (the soft underbellies of United, Empoli, PSG, and Fiorentina). Group 10 has Lyon, Roma, and Bordeaux:

 

and group 13 has the strange trio of Spurs, Everton, and Real Madrid:

 

 

The rest of the teams on the right side of the defensive family tree are only tangentially related to Liverpool and their siblings: Group 16 (Bielsa Disciples), Group 18 (Toulouse and Malaga), Group 6 (Inter and Barcelona), Group 19 (the dominant German pressure of Dortmund, Bayern, and Leverkusen), and Group 8 (the Nullifiers). Maybe at a funeral or a wedding you see one or two of these guys, but you are vaguely aware there is a little of your blood in their bodies. Once you get to the other side of the family tree, it’s open season. No longer are you close enough relatives to know any of the same people or worry about incest, you are basically total strangers. Some of these total strangers include a massive group 5 full of German teams, Gladbach and Torino’s total box shutdowns in group 12 and the general low quality of Chievo, QPR, Burnley, and Villa in group 2.

 

 

 

Strengths, weaknesses and looking forward

This is the next step on a project that is eventually wanting to determine if certain types of teams play better against other types. Maybe we find out that the Bielsa Disciples do better than expected against teams who like to have low possession and use long balls to move the ball upfield. Maybe the Nullifiers don’t do well against those teams. I don’t know if anything will come of that but that’s what I’m going to continue to look at. The next step is to build an offensive family tree and then start looking at match-ups between the two.

 

The strengths are I like identifying relationships I had no idea about before. The Celta Vigo example is perfect. It was a team I knew little about and the numbers told me they played like Bielsa disciples before I had researched them or watched a game.

 

The possible weaknesses are maybe too much info is being lost along the way here. Passing numbers are being converted to percentiles which are being used as one of 8 metrics to group teams together, then clustered further. So much is lost at each step, it’s hard to make any concrete conclusions from just seeing this tree. It’s an incredibly fun and I think a valuable informational tool, but might have limited value if you are a coach or a GM setting up a team. Also, the metrics could (and almost surely should) use some fine-tuning. Field tilt in particular needs to be honed to include every pass, and each one could use a full article or two to really flesh out. For know, this remains a first draft but one that could open promising new areas for exploration.

 

To conclude the article, every group and their metrics are posted below. Hopefully you enjoyed!  Comments are closed here, so if you have questions, comments, or want to discuss the article you can go to my blog by clicking here and comment on the article which should be posted soon after this article runs here or chat with me on twitter @SaturdayonCouch

 

The Groups

Group 1

Atalanta, Milan, Sampdoria, Verona, Leicester City, Stoke City, Sunderland, Swansea City, West Brom, West Ham

 

 

Group 2

Cagliari, Chievo, Palermo, Aston Villa, Burnley, QPR

 

Group 3

Cesena, Udinese, Lens

 

Group 4

Empoli, Fiorentina, Manchester United, PSG

 

Group 5

Genoa, Mainz, Frankfurt, Augsburg, Schalke, Hamburg, Hannover, Freiburg, Bremen, Hoffenheim, Stuttgart, Wolfsburg

 

Group 6

Inter, Barcelona

 

Group 7

Juventus, Lazio, Arsenal, Chelsea, Liverpool, Southampton, Man City

 

Group 8

Napoli, Lille, Nantes, St Etienne, Atletico Madrid

 

Group 9

Parma, Palace, Hull, Newcastle, Koln, Hertha, Paderborn

 

Group 10

Roma, Bordeaux, Lyon

 

Group 11

Sassuolo, Cordoba, Deportivo, Eibar, Espanyol, Getafe

 

Group 12

Torino, Gladbach

 

Group 13

Everton, Spurs, Real Madrid

 

Group 14

Bastia, Caen, Evian, Guingamp, Montpellier, Rennes

 

Group 15

Lorient, Metz, Monaco, Reims

 

Group 16

Marseille, Celta de Vigo, Ray Vallecano

 

Group 17

Nice, Almeria, Elche, Granada, Real Sociedad, Sevilla, Villarreal

 

Group 18

Toulouse, Malaga

 

Group 19

Leverkusen, Dortmund, Bayern

 

Group 20

Athletic, Levante, Valencia

 

 

The central metrics for each group: (percentiles)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Team Style Profiles and Similarity Scores

 

 

How do you find the most English team? You could count English internationals, home-grown players, the most fans, or simply refer to the picture above and declare that game the most English game of recent years. I took a different approach to find out which team played closest to the English style this last season. To do so, we need to develop a way of profiling teams by their style. For this we will use a number of metrics, listed below:

 

Both offense and defense

-Possession

 

Offense

-field tilt (ratio of attacking third/own third completions)

-shot tempo (shots per pass)

-intrabox success rate (completion % on passes that begin and end inside the box)

-pass length

-centrality (% of passes toward the center of pitch in final third)

-box attacks (passes into the box)

-forward play (% of passes that are forward)

 

Defense

-field tilt

-high press rate (% of passes completed that are 60+ yards away from the goal)

-shot tempo

-intrabox success rate

-centrality

-box attacks

-forward play

 

For each metric, a team’s rate was compared to the European average and standard deviation to get a z score, which was then used to make a team profile. For example, Villareal allows 31% of intrabox passes to be completed. The European average is 40.4% with a standard deviation of 5.4%. This puts Villareal in the 4th percentile for ease of intrabox passing against. This is done for each metric to create a team profile (Villarreal shown again):

 

 

You can see the two things that jump out are that they shut down the box and also force teams to the flanks more than any other team in Europe.

 

If you do this for each team in a league you begin to see some significant stylistic differences. I’ve looked at differences in shooting across leagues before and Colin Trainor and others have written about it on this site. Others have written very well about defensive differences from league to league (two are here and here). These profiles are another way of looking at league differences through how they play the ball. Spanish, Italian, and English teams have significantly higher field tilt than German and French teams. England and France are well ahead in intra-box pass % with Spain and Germany significantly behind. Box passes can be seen below:

 

 

 

Putting it all together, here is the composite England style profile (average of each team):

 

 

 

To find the most English team we need to use another tool in its early stages: the Style Similarity Score. It’s a simple tool that compares percentile differences across the different categories (with slight weighting changes, they are ordered according to importance in the list at the start of the article) and gives us a number summing up all of those differences. If a team had exactly the same numbers as another, their Style Similarity Score would be 0, and the higher you get the more different the teams playing styles theoretically are. Here are two quick examples:

 

 

The eye test doesn’t completely contradict anything I’ve seen, which makes me think this is a good first step. I wanted to use this new tool to find the real essence of each league. The glitz and glamour of Arsenal, Bayern, Barcelona and PSG are well-known but certainly aren’t representative of the average team in each of those leagues. So I put the English profile from above through the similarity score to find the two teams most similar so I’d know what game to watch if I wanted to find the true heart of Premier League football. I did this for each of the top 5 leagues.

 

Results

 

England: Stoke City v Aston Villa

Italy: Palermo v Sassuolo

France: Lorient v St Etienne

Germany: Frankfurt v Stuttgart

Spain: Deportivo de La Coruna v Valencia

 

If you had sat down and watched all 11 of these matches between these sides this season, I think you would have a good taste of the differences between the leagues. Just looking at the results you can see that: Frankfurt and Stuttgart played a 5-4 classic and a 3-1 as well while St Etienne beat Lorient three times by scores of 2-0, 1-0, 1-0 without a first-half goal.

 

The EPL is an interesting case as it has way fewer teams that “look” like the average side. This is because the league is more stratified in the way they pass. Burnley, QPR, Palace, Stoke, West Ham, and Hull all are in the top 15% of most long balls while Arsenal, City, Swansea, Liverpool, Spurs, Chelsea, and Everton are in the bottom 15%. This wide split between groups of teams means there isn’t a big group of teams playing near the average English style (like there are in Germany, France and Spain) but Stoke-Villa is as close as it gets.

 

 

Where do we go from here? 

With more work, team profiles and similarity scores could be used to look at how teams and styles match up against another. If we can see that Dortmund struggle more against teams who press them back then teams who sit back and force play wide you can alter your tactics (if you are a manager) or alter your bets. It’s another piece of information on top of shot data like expG: if Villarreal and Marseille had the same expG rating you would know Dortmund was a better bet against Marseille’s style of expG than Villarreal. Maybe teams that sit back and play long balls do great against teams that have high final third possession numbers like the conventional wisdom says, maybe they don’t. Game-to-game and month-to-month changes in tactics and style could be tracked much more clearly. Similar styles could be mapped together to see if their shots or shots allowed are different than the normal to improve xG models. One early example of this involves Swansea. I wrote about how expG models do not properly capture what Gladbach has been doing so I was interested to see who was similar to them. They turned out to be a rather unique profile with not many similar teams but the closest team was Swansea. Despite having a poor intra-box defense the Swans track well with Gladbach. When I checked their goal numbers relative to expected goals, sure enough they have been over-performing for 3 straight seasons now in my model. I haven’t done a deep dive into that yet, but it’s something I might not have seen without the similarity score.

These Team Style Profiles and Style Similarity Scores are good first steps but there is lots of room for improvement and without tracking data there are limitations.  Should different metrics be chosen? There are pretty strong relationships between possession, field tilt, and box attacks for example so should they all go into the mix? Should the weight assigned to each metric when comparing with other teams be adjusted? What about teams who change styles often throughout games and season like Thomas Tuchel did at Mainz? At the end of the year the stats only look one way but it covers up a ton of variance, there needs to be a metric for flexibility for sure. Certainly changes will be made, one of the first being improving field tilt to include all completions and not just a simple ratio of attacking/own third.