A 2014 McKinsey report on the data analytics revolution highlights the need for “translators”, those with a skillset that spans some combination of data, analytics, IT, and business decision-making.  The recent OptaPro Forum echoed this need in the soccer world: how do analysts, scouts, and decision-makers best bridge the gap from the technical world to the soccer world? 

MLS formed a partnership with Opta Sports in 2011, providing the league with the same rigorous match analysis that’s essentially a global standard.  However, as recently as last summer, some MLS teams claimed little interest in analytics, or that analytic insights somehow lack importance because they don’t cover intangibles.  What’s more, information described as “analytics” can easily focus on performance optimization instead of increased tactical understanding.

Much like the “nature or nurture?” debate, “analytics or scouting?” is a false dichotomy.  Both can and do help improve understanding of the game.  There are signs of analytics’ growing influence.  Take the concept of Expected Goals: former Opta Analyst and current Toronto analyst Devin Pleuler referenced it in a video on MLSSoccer.com, adding context to New England’s streaky mid-season results.  Arsene Wenger mentioned the term in a recent interview, and the official Premier League Web site featured the stat when comparing Chelsea and Leicester.  However, a resistance toward advanced statistics persists.  Tim Sherwood called Expected Goals a “load of nonsense”, and Roy Hodgson recently ranted on why numbers don’t matter in soccer.


It’s worth noting that those in the soccer analytics community have likely spent far more time with certain concepts than soccer practitioners.  Something like Expected Goals is a fairly well-known analytics concept.  Expert modelers such as Michael Caley, 11tegen11, and the group at American Soccer Analysis have spent years honing their respective models, testing and learning along the way.  What’s more, work continues on the study of Expected Goals, including elements like defensive positioning and shots not taken.

While the analytics community is already on third-order implications of Expected Goals, it’s vital to remember just how new it is to mainstream soccer discussion.  The next time I hear Expected Goals mentioned on an MLS broadcast will be the first time.  This isn’t to say that Expected Goals is some inscrutable measure.  Ultimately, it is a measure of probability – Danny Page provides an excellent overview here.  In the moment, a shooter either scores a goal or doesn’t score, but over time, certain shots have a higher likelihood of resulting in a goal.  It aligns with intuitive soccer sense that, all things equal, it’s easier to score closer to the goal than farther away

Where an analytics expert can make this information actionable is to show how a player generates an Expected Goal total.  Sebastian Giovinco, Bradley-Wright Phillips, and Kei Kamara led MLS in Expected Goals last season, but Giovinco had a far greater percentage of unassisted shots than Phillips or Kamara.  Translating this insight – Giovinco can create his own shot – into visual form is even better, and is a best practice from Pep Guardiola and staff on down.

Another helpful approach toward bridging the gap is to index statistics.  As an example, let’s take a concept that’s been around since the 2010 World Cup: attacking tendencies by channel.  I often focus on the 3 or 4 bands of defenders, midfielders, and forwards, but in modern soccer, the pursuit of space is horizontal and vertical.  Finding teams that over-index or under-index a league average can uncover more context on preferred playing style (Statsbomb does this extremely well.)

Here are the attacking tendency figures for the 2015 MLS season:


Vancouver topped the league in left side attacking tendency: 39% of their attack came from the left channel, compared to a league average of 35%.  Said another way, Vancouver has a left-side attack index of 111.  Columbus was the most central team last year, with an index of 116.  No team stood out as overly right-sided.

Here’s where the combination of stats analysis and soccer analysis matters.  The above chart is helpful, but incomplete without additional background.  I think of it like a two-person broadcast booth: analytics provides the play-by-play, while soccer experience provides the color commentary.  The two create a virtuous cycle of increased understanding.


Vancouver’s left-sided tendency likely comes from Kekuta Manneh’s increased effectiveness as the attacking left midfielder in Vancouver’s 4-2-3-1.  Traditional stats show he started almost every game (a good sign of importance), scored 7 goals, and had 6 assists.  Traditional scouting would show he’s absurdly fast, likes taking guys on 1 v. 1, and plays as an inverted winger to set-up a right footed shot.  Further stats to augment this player profile show that Manneh was 4th in the league in successful dribbles (2.2 per game), but he only connected on 6 out of 56 attempted crosses.  Broadening our scope, we see that left back Jordan Harvey rarely surged forward to provide additional width or crossing.  We can assume that Manneh is indeed generating the attack from this side.  With this fuller picture, opposing defenses will likely play deeper than normal to ensure Manneh doesn’t get in behind, and Vancouver will look for ways to isolate Manneh out wide on the left.

Of course, one of the challenges with statistics is application in the moment.  As Mike Goodman wrote in his analytics article featuring quotes from Ted Knutson and Daniel Altman, fluctuations and variance in performance are inherently messy.  To illustrate this, here is the game-by-game attacking tendency for San Jose, a team that was essentially league-average by the end of the year.


Whereas Vancouver rarely deviated from its 4-2-3-1 and had stable personnel, San Jose experienced upheaval when they lost Innocent Emeghara to injury in the 8th game of the season.  Add in mid-season acquisitions of Marc Pelosi, Anibal Godoy, and Quincy Amarikwa, and it’s not surprising that San Jose experimented with various formations throughout the season.  Here, traditional scouting should take the lead, informing teams of player availability or recent changes.  Where analytics could complement that work is a rolling average of attacking channel preference throughout the season.  This can help identify outliers (the 9th game, where the Quakes attacked down the left 49% of the time), or a substantial shift (a more balanced approach toward the end of the season with Godoy in central midfield.)

As Brian Phillips eloquently stated, “Soccer gives players more chaos to contend with than any other sport.  So there’s something uniquely thrilling about the moments when they manage to impose their order on it.”  The pursuit of that order is best achieved through a combination of analytics and traditional soccer analysis.