How we measure “stylistic fit” without overfitting nostalgia

“We need a left-back like Marcelo.” It’s a useful sentence and a dangerous one. Useful because it instantly conveys a profile every football brain in the room can picture. Dangerous because it can mean five different things to five different listeners — and worse, it can mean nothing precise to a search engine.

Match, our recommendation engine, lives at the boundary between those two truths. It has to translate fuzzy operator language into rigorous, comparable, defensible rankings — without flattening what makes a player distinct. Here is how we do it without overfitting nostalgia.

The three-layer model

Stylistic fit at Scout Atlas is not a single similarity score. It is a stack of three independent layers, each computed nightly, each explained in plain English alongside the result.

Layer 1 — Behavioural fingerprints

For every player with at least 900 league minutes in the last two seasons, we compute a 200-dimensional behavioural vector. Not raw stats. Behavioural derivatives: progressive carry distance per touch, defensive zone activity skewed for opposition strength, scanning frequency before progressive passes, post-loss recovery distance.

These are the features that survive normalisation across leagues. A 70-minute League of Ireland match isn’t the same canvas as a Premier League match — so we normalise opportunities, not outcomes. The fingerprint compares behaviour at parity.

Layer 2 — Role context

“A left-back like Marcelo” isn’t just a behavioural shape. It’s a behavioural shape in a system. We tag every match in our corpus with the player’s implied role (inverted full-back, classical full-back, wing-back in a back five, hybrid wide centre-back) using a graph-based formation classifier. Stylistic similarity is then computed conditional on role — so a Bayern hybrid is compared against other hybrids, not against an Atalanta wing-back.

Layer 3 — Decision signature

The third layer is the most experimental and the one we’re most excited about. We train a sequence model on labelled decision points — receive-under-pressure, defensive-press-trigger, transition-spring — and produce a probability distribution over decision classes for each player. The decision signature captures what a player tends to do when given a choice. Two players with identical behavioural fingerprints can have completely different decision signatures, and the signature usually predicts how the player adapts to a new system.

Three things we explicitly don’t do

Every recommendation engine is shaped by what it refuses to do. Match has three firm refusals.

We don’t train on private member-club data without consent. The fingerprints come from public open-data corpora and licensed event data. Member clubs’ private notes, GPS, and shortlists are theirs — they enrich a club’s personal model, not the cross-club one.
We don’t hide the leagues a brief covered. If a brief filtered to the top 5, we say so on every result. If a player wasn’t included, we tell you why (insufficient minutes, league not yet ingested).
We don’t pretend a 60-confidence ranking is a 95. When the ensemble disagrees — XGBoost likes a player, CatBoost is unsure — we flag the variance directly. Low confidence is itself a signal worth surfacing.

How we keep the comparisons honest

Two safeguards run alongside every Match score.

The first is survivor bias correction. Football media gravitates to winners. Behavioural similarity to a famous player can be a dangerous proxy — it’s a great filter for picking up on retrospective genius and a poor filter for predicting future fit. We rebalance training cohorts to include the “noisy middle” — players who looked like a star, and didn’t become one — explicitly.

The second is cohort calibration. We test the model not on the Premier League golden child, but on the Allsvenskan winger nobody had heard of in 2021 who is now a Bundesliga regular. If the model couldn’t have surfaced him with high confidence in 2021, we go back to the drawing board. Most “similarity” engines celebrate the players they predicted; we measure ours by the players they missed.

What you actually see in the product

When you open a player in Scout Atlas, “Stylistic peers” shows the top six players across our corpus by combined fingerprint + role + decision similarity, with a feature-attribution breakdown for each pair: where the similarity is concentrated, where it diverges. You see the comparison and the limits of the comparison.

“A left-back like Marcelo” becomes useful again — but you no longer have to take it on faith. The math is on the page.