A 90-minute match,
scouted in three.
Vision is a computer-vision pipeline that watches a match the way a great scout would — except in three minutes, in the same format every time, on every player on the pitch.
In one sentence
Drop in a match link.
Get a structured scouting report — and the highlight reel — in less time than a coffee.
Most clubs spend 4–6 hours per match per scout on video review. Vision does the first 80% of that work — tracking, event detection, role-adjusted clipping — so your scout starts with the questions, not the prep.
The output is deliberately standardized. Every Vision report follows the same structure: identification, role, in-possession, out-of-possession, set-pieces, psychometrics inferred from body language, and a flagged-moments index. Your reports become comparable. Finally.
What Vision sees that the human eye misses.
✓
Player tracking
Every touch, every off-ball run, every defensive recovery — across all 22 players, full match.
✓
Pose estimation
First-touch quality, body-shape on receiving, scanning frequency before touches.
✓
Event detection
Progressive passes, line-breaking carries, press triggers, second balls — labelled and timestamped.
✓
Auto-clipped reels
Per-player highlight + lowlight reel, generated from the events the player actually drove.
✓
Role context
Reports adjust expectations to the role you’re scouting for — a #6 isn’t graded against a #10.
✓
Cross-match consistency
Every report follows the same template, so a scout can compare 12 games in a sitting.
The stack that watches the match.
Detection · YOLOv11
State-of-the-art object detection trained on football-specific datasets (SoccerNet, custom labelled clips). Real-time on a single GPU.
Tracking · ByteTrack
Multi-object tracking with re-identification. Players keep their IDs through occlusions, substitutions, and camera cuts.
Pose · YOLOv8-Pose
Keypoint estimation for body-shape, first-touch quality, and scanning. Thirty-one keypoints per player per frame.
Action heads · CNN+LSTM
3D spatio-temporal classifier trained on annotated event sequences from StatsBomb and SoccerNet. Outputs labels at 25 FPS.
Re-ID · Siamese networks
Maintains identity across camera angles and broadcasts. Lets us merge multiple sources into a single timeline.
Tactical layer · GNN
Graph neural network over player positions to infer formations, line-shape, and pressing triggers.
Send a match link. We’ll send the report.
During pilot, we run Vision live for any candidate club. You send a public match link — broadcast feed or tactical cam — and we send back the auto-report and clipped reels for any player you name.