What it costs to scout 90 minutes in three

“Free AI” is a marketing line. Vision is not free. Watching a 90-minute match, ingesting the broadcast feed, running detection, tracking, pose, action heads, re-identification, and producing a structured 500-word report — that costs real GPU minutes and real electricity. The interesting question is not whether it costs something. It is how much, and what the right cost target should be.

Our target is straightforward: less than a coffee per match. Not free. Not heroic. Less than the espresso the scout would have bought instead.

What runs in the pipeline

A single match goes through six stages. Each one has a real GPU bill attached.

Ingest and pre-process. Pull the broadcast feed, transcode to a uniform 1080p / 25fps internal format, strip the obvious replays. CPU-bound with a small GPU assist for transcoding. Cheap.
Detection. YOLOv11 across every frame. The dominant cost on the GPU. Optimised with TensorRT, batched aggressively, runs at ~3× real-time on an L4.
Tracking. ByteTrack, on the YOLO outputs. Mostly CPU, but the data movement matters. We co-locate it with detection to avoid the round trip.
Pose. YOLOv8-Pose, only on tracked players in the active half. Skipping the inactive half is a meaningful win — it’s about 40% of frames.
Action and tactical heads. A CNN+LSTM stack for action classification, a small GNN for formation. Cheap on GPU; expensive on engineering.
Re-ID and report. Siamese network for player re-identification across camera cuts; structured-prose generator for the 500-word report. The report generator is the only LLM call in the pipeline and it’s small and short.

What it costs, honestly

On a single L4 instance, end-to-end Vision processes a 90-minute match in roughly 25–32 GPU-minutes, depending on action density and how much of the frame is occluded. At spot pricing for an L4 in our default region, that lands the per-match GPU cost at somewhere between €0.20 and €0.45. Add storage, ingest egress, and the LLM call for the report, and we are reliably under €1 all-in.

That is below the espresso threshold. The cost is real, but it is the kind of real that scales: the marginal match is cheap, the model isn’t getting more expensive over time, and the reporting volume per match is a steady ~500 words.

What “free” would actually cost

We could push the per-match cost lower with three obvious moves: distillation, quantisation, and aggressive frame skipping. Each is a real lever, with a real bill.

Distillation. A YOLO-Nano teacher → student gets the detection cost down ~35%. The cost is two weeks of training and ~3% recall on small-object cases — the ball when partially occluded by a defender, mostly. We will probably do it. We will not pretend it’s free.
Quantisation to INT8. Another ~25% off detection latency. Negligible quality loss in our calibration set. Already on the roadmap; held back by the fact that the YOLO ecosystem still has corner cases at INT8 we haven’t debugged.
Frame skipping. Process every third frame, interpolate the rest with an optical-flow head. The biggest single win, also the biggest single quality risk — a missed frame is a missed shot, and shots are why anyone is watching.

Each of these gets us closer to “a tea per match”. None gets us to free. We are wary of any vendor whose answer to GPU economics is a confident “zero”.

Why the cost target matters

The cost target shapes the product. At €0.50 per match, Vision is something you run on every priority target every week without asking permission. At €5 per match, Vision is a budget line item that requires a reason. At €50 per match, Vision is a set-piece feature that exists in marketing but never gets switched on.

We pick the target deliberately. The product needs to be the kind of thing a scout runs for the same reason they’d open a clip — because the marginal cost is small enough that hesitation is not a feature. If we missed the cost target by an order of magnitude, the product would be different. The scout would batch, defer, queue, triage. The pipeline’s economics is the product’s ergonomics.

What we will not do

We will not run the pipeline on top of a third-party “video AI” API and charge a markup. We tried two of them in the prototype phase. The per-match cost was 12× ours at lower quality, the latency was 4×, and the failure mode on degraded broadcasts was a silent JSON shrug. The owned pipeline is more work. It is also the only way to hit the cost target without lying about it.

Our commitment, for as long as it stays achievable: a Vision report on any match in the corpus, at less than a coffee. We will write down the moment that stops being true. Until then, run more Vision.

What it costs to scout 90 minutes in three

What runs in the pipeline

What it costs, honestly

What “free” would actually cost

Why the cost target matters

What we will not do

The transfer window is broken — and the tools made it worse

How we measure “stylistic fit” without overfitting nostalgia

If this resonated, the next move is a conversation.