Halo Goods case study

Context

Halo Goods sells direct-to-consumer with revenue concentrated in flash sales and Black Friday week. The previous Black Friday, a scaling failure took checkout down for 90 minutes at peak.

A three-person platform team managed capacity with spreadsheets and manual GKE node-pool resizes.

The challenge

Flash-sale traffic arrives in minutes, not hours — reactive autoscaling alone couldn't keep p95 latency inside budget during the ramp.

Infra cost per order had risen every quarter; the easy fix (permanent over-provisioning) was exactly what the CFO vetoed.

Approach

Load modeling & game days
Weeks 1–3
Replayed historical sale traffic against a production-shaped environment, found the four bottlenecks (checkout DB connections chief among them), and fixed them before touching the platform.
Self-service platform
Weeks 4–9
Crossplane compositions gave product teams one-line provisioning for services, queues, and databases — with sizing, policies, and observability baked in. Tekton golden-path pipelines standardized deploys.
Capacity intelligence
Weeks 10–12
Thanos federated metrics across clusters; capacity forecasting drives scheduled pre-scaling ahead of announced sales, with HPA handling the unannounced spikes.

Architecture

Edge

Cloud CDNGlobal LBCloud Armor

Delivery

Tekton pipelinesGolden pathsCrossplane APIs

Platform

GKE regionalHPA + pre-scaleSpot node pools

Observability

ThanosCapacity forecastsSLO burn alerts

Self-service provisioning via Crossplane; scheduled pre-scale + HPA absorbs flash-sale ramps.

Results

14×

peak absorbed

Black Friday peak vs. baseline, p95 latency inside budget throughout.

P0 incidents

Across Black Friday week and every flash sale since.

37%

lower infra cost / order

Spot pools + right-sizing + scale-to-zero for off-peak services.

2 min

p95 scale-out

Down from 11 minutes — measured from spike start to capacity-ready.

“Last year we watched dashboards and prayed. This year the platform pre-scaled itself and the on-call engineer watched a movie.”
— Head of Engineering, Halo Goods

Stack

GKECrossplaneThanosTekton

Context

The challenge

Approach

Load modeling & game days

Self-service platform

Capacity intelligence

Architecture

Results

Stack