Closed beta

Engine 02 · Vision

A 90-minute match,
scouted in three.

Vision is a computer-vision pipeline that watches a match the way a great scout would — except in three minutes, in the same format every time, on every player on the pitch.

In one sentence

Drop in a match link.
Get a structured scouting report — and the highlight reel — in less time than a coffee.

Most clubs spend 4–6 hours per match per scout on video review. Vision does the first 80% of that work — tracking, event detection, role-adjusted clipping — so your scout starts with the questions, not the prep.

The output is deliberately standardized. Every Vision report follows the same structure: identification, role, in-possession, out-of-possession, set-pieces, psychometrics inferred from body language, and a flagged-moments index. Your reports become comparable. Finally.

Vision · sample outputMatch: Vitória SC vs FC Porto · 18 Apr · 90'

● Tracking · 22 players · 71'14

events.detected

71’02progressive_pass#10 → #9

71’08turnover#10 / press

71’14shot_attempt · GOAL#10 / RIGHT_FOOT

74’21high_press_recovery#10 / 6.2 m

83’39second_assist#10 → #7 → #9

Capabilities

What Vision sees that the human eye misses.

◯

✓

Player tracking

Every touch, every off-ball run, every defensive recovery — across all 22 players, full match.

◇

✓

Pose estimation

First-touch quality, body-shape on receiving, scanning frequency before touches.

⚑

✓

Event detection

Progressive passes, line-breaking carries, press triggers, second balls — labelled and timestamped.

▶

✓

Auto-clipped reels

Per-player highlight + lowlight reel, generated from the events the player actually drove.

⌖

✓

Role context

Reports adjust expectations to the role you’re scouting for — a #6 isn’t graded against a #10.

☷

✓

Cross-match consistency

Every report follows the same template, so a scout can compare 12 games in a sitting.

Inside the pipeline

The stack that watches the match.

Detection · YOLOv11

State-of-the-art object detection trained on football-specific datasets (SoccerNet, custom labelled clips). Real-time on a single GPU.

Tracking · ByteTrack

Multi-object tracking with re-identification. Players keep their IDs through occlusions, substitutions, and camera cuts.

Pose · YOLOv8-Pose

Keypoint estimation for body-shape, first-touch quality, and scanning. Thirty-one keypoints per player per frame.

Action heads · CNN+LSTM

3D spatio-temporal classifier trained on annotated event sequences from StatsBomb and SoccerNet. Outputs labels at 25 FPS.

Re-ID · Siamese networks

Maintains identity across camera angles and broadcasts. Lets us merge multiple sources into a single timeline.

Tactical layer · GNN

Graph neural network over player positions to infer formations, line-shape, and pressing triggers.

Vision · what ships today

Honest disclosure

v0Design phase
Architecture documented and components prototyped in isolation. No production inference path yet — every page that mentions Vision shows the "Design phase" disclosure. The pipeline economics essay (/resources/vision-pipeline-cost) describes the GPU budget once we wire it.
v0.1 alphaIn training· lands after pilot data-sharing agreements
YOLOv11 + ByteTrack tracking on a curated 50-match broadcast set. Behind a queue with manual review gate before any output ships back to a club. Closed beta with two pilot clubs first.
v0.2Design phase
YOLOv8-Pose + CNN+LSTM action heads + Siamese re-ID + GNN tactical formation classifier. Full per-match auto-report ≤ 3 minutes at < €1 GPU spend.

Visuals on this page are illustrative renderings of the planned output, not live model inference. We do not run YOLO in production today.

See Vision live

Send a match link. We’ll send the report.

During pilot, we run Vision live for any candidate club. You send a public match link — broadcast feed or tactical cam — and we send back the auto-report and clipped reels for any player you name.

Request a sample report See Shield next

A 90-minute match,scouted in three.

Drop in a match link.Get a structured scouting report — and the highlight reel — in less time than a coffee.