Why Shield is intentionally conservative

In injury prediction, false negatives are catastrophic and false positives are cheap. A model that misses a hamstring tear on a €40M asset has cost a club a season. A model that flags a watch tier on a player who turns out to be fine has cost a club one extra rest day in pre-season. The asymmetry is not subtle. We tuned Shield accordingly — and the rationale is in the math, not the marketing.

Why football injury models usually fail

Most academic injury models report impressive accuracy and quietly miss the worst injuries. The reason is structural. Hamstring tears, ACL ruptures, and stress fractures are rare events. A naïve model that predicts “no injury” every day for a fit Premier League squad is right ~98% of the time. Accuracy is the wrong metric. Recall on the rare class is the right one — and it is brutal.

Worse, the deployment context is unforgiving. A club doesn’t get a clean test set. It gets one player, one workload, one match coming up on Saturday. A model that cannot give an actionable signal at the moment a decision is made is not a model. It is a slide.

The four tiers, and why we pick four

Shield outputs a tier, not a probability. Four tiers: low, watch, elevated, high. The thresholds are calibrated against historical 30/60/90-day injury outcomes, then deliberately shifted to err on the cautious side at the watch and elevated boundaries.

Three tiers would force false confidence: green / amber / red, with “amber” doing too much work. Five tiers would be cosmetic — humans don’t reliably distinguish five categories of risk under match-day pressure. Four is the smallest set that lets us separate the “you should think about it” signal from the “you should act on it” signal without losing the “fine, no signal yet” baseline.

Conservative by design

Conservatism, here, is a precise engineering choice. Three places it shows up:

Asymmetric loss. The training loss penalises false negatives at 7× the weight of false positives. The number isn’t arbitrary; it comes from a rough cost ratio between “player out for 4 weeks” and “player rested one extra day”.
Threshold drift. The watch and elevated thresholds drift toward caution mid-season as fatigue accumulates. A model that uses the same threshold in October and April ignores known biology.
Ensembling with disagreement preservation. Random Forest, Survival Analysis, and an LSTM workload-anomaly head vote separately. We surface the disagreement when it’s sharp — three calm models vs. one shouting model is itself a signal worth showing.

What conservative does and doesn’t mean

It does not mean flagging everyone red. A model that screams continuously is useless — clubs ignore it inside two weeks, and we’ve seen exactly this happen with two prior commercial systems. The cost of a noisy flag is real, just smaller than the cost of a missed flag.

It does mean being willing to flag watch on a perfectly fit player and be wrong publicly. The product handles this with a feature attribution panel: when Shield flags watch, the panel shows the workload anomaly, the historical peer trajectory, and the survival-curve position. The flag is auditable. The conservative bias is auditable. Clubs can argue with it — and often should.

What sits behind the tier

Shield combines three model heads, each tuned to a different signal class.

The workload head is an LSTM trained on session GPS where available, broadcast-derived player-load proxies where it isn’t. It reads the last 21 days as a sequence and flags acute-to-chronic workload deviations the sports-science literature has been consistent about for two decades.

The survival head is a Cox proportional-hazards model with time-varying covariates: age, position, history, recent minutes, opposition intensity. It estimates the hazard function for the next 90 days. We surface its 30-day, 60-day, and 90-day projections separately because the action menu is different at each horizon.

The anomaly head is a Random Forest trained to recognise the pattern that precedes the worst-class injuries — not the injury itself, but the fingerprint of the two weeks before it. It is the head most likely to disagree with the others. When it disagrees and is right, it is right early. We treat disagreement as a feature, not a bug.

What we don’t pretend

Shield is not a medical device. It does not replace a club doctor, a sports scientist, or a head of performance. It is a signal layer over the data that already exists — broadcast load proxies for everyone, GPS and biometrics for opt-in private streams. It is calibrated to be useful inside a real Tuesday decision, not to win a Kaggle leaderboard.

The clubs that get the most out of it treat it the way a good captain treats a radar: as one input among several, with a known false-positive bias, and a seven-times-worse false-negative cost. That is the design. The conservatism is the point.