Hourglass Design
The Hourglass Design is a way to think about system design as data transformation: consume data from a source, turn it into something useful, and present it to the end user at the right place, at the right time, and in the right format.
This is similar to just-in-time manufacturing: do not produce every possible output upfront. Understand demand first, then decide what must be prebuilt, what can be assembled on demand, and what should be delivered only when the user or downstream system needs it. In system design terms, that means choosing what to store raw, what to transform ahead of time, what to compute on read, and what to cache close to the consumer.
The top of the hourglass is broad because many things can produce data: users, devices, services, partners, files, jobs, and webhooks. The middle narrows because a good system needs a clear transformation contract: what data means, where it is stored, how it is queried, and what business logic turns input into useful output. The bottom widens again because transformed data may be presented through mobile apps, dashboards, notifications, exports, APIs, search, feeds, or analytics.
Use the hourglass with the interview flow from Interview Tips: clarify the scope, draw the high-level data flow, then deep dive where the design risk is highest. The six core blocks are Source, Type, Storage, Pattern, Logic, and Presentation. Around them, the cross-cutting concerns are Security, Scalability, Reliability, Observability, and Infrastructure.
The short version:
Source tells you what Data A is
Type tells you what Data A means
Storage tells you what state must be kept
Pattern tells you how Data B will be requested
Logic tells you how Data A becomes Data B
Presentation tells you how Data B is delivered
Hourglass Checklist
Use this as a vertical checklist while designing. Each step should produce an answer that makes the next step easier. Start with one unit of Data A, scale it into a dataset, then follow the transformation until Data B is ready for the final user-facing output.
1. Source: what does one producer emit?
Core question: for a single producer, what is one unit of Data A?
- Producer: user, device, service, partner, file, scheduled job, or webhook.
- Event: the payload emitted by that producer.
- Ingress: protocol or mechanism: REST, gRPC, MQTT, webhook, queue, upload, or batch job.
- Identity and trust: source ID, event ID, auth, validation, and deduplication.
Output: one clear Data A event. This feeds Type because you now know what fields must be understood at scale.
2. Type: what does Data A look like across all producers?
Core question: when many producers emit Data A, what dataset does that create?
- Schema: required fields, optional fields, validation rules, and schema evolution.
- Representation: JSON, Protobuf, Avro, CSV, image, video, log line, or binary payload.
- Meaning: units, timestamps, time zones, IDs, encoding, enums, and field semantics.
- Scale shape: producer count × per-producer frequency × event size.
Output: schema, event size, event rate, and rough daily volume. This feeds Storage because volume and shape determine what can be stored, indexed, or discarded.
3. Storage: where do Data A and Data B live?
Core question: what must be stored as source-of-truth Data A, and what can be rebuilt as derived Data B?
- Source of truth: canonical records, raw events, objects, or transaction state.
- Derived state: feeds, search indexes, aggregates, reports, caches, and analytics.
- Lifecycle: mutability, retention, archival, deletion, and consistency needs.
Output: what is stored, what is derived, how long it lives, and how consistent it must be. This feeds Pattern because stored state must match how Data B will be read.
4. Pattern: how will Data B be requested?
Core question: what access pattern does Data B need to support?
- Lookup: by ID, owner, time range, location, search term, relationship, or tenant.
- Shape: single object, list, feed, aggregate, ranking, map, graph, export, or stream.
- Freshness and scope: latest vs stale, global vs tenant/user/region/permission scoped.
Output: query shape, indexes, partition keys, cache keys, and freshness expectations. This feeds Logic because the transformation should produce Data B in the shape consumers need.
5. Logic: how does Data A become Data B?
Core question: what validation, enrichment, computation, and side effects turn Data A into useful Data B?
- Validation: reject invalid, duplicate, unauthorized, or malformed input.
- Transformation: enrich, join, normalize, classify, geocode, rank, score, summarize, or aggregate.
- Timing: compute on write, on read, on a schedule, or continuously in a stream.
- Side effects: notifications, search indexing, analytics, billing, emails, and webhooks.
Output: the transformation plan from Data A to Data B. This feeds Presentation because the final surface should receive useful, timely, correctly shaped data.
6. Presentation: how is Data B delivered just in time?
Core question: who needs Data B, where do they need it, when do they need it, and in what format?
- Audience: end users, admins, partners, internal operators, analysts, or other systems.
- Delivery: mobile, web, dashboard, notification, API, export, embedded widget, or report.
- Timing: instant, live, eventually, scheduled, or on demand.
- Format: table, chart, map, feed, ranking, card, file, alert, or machine-readable response.
Output: the delivery contract for Data B: audience, place, time, format, latency, and permissions.
Cross-Cutting Checks
After the six core blocks, check the system qualities around the hourglass:
- Security: who can create, transform, store, and view each piece of data?
- Scalability: what grows: sources, objects, writes, reads, transformations, or outputs?
- Reliability: can the system retry safely, recover from failure, and rebuild derived data?
- Observability: can we see data moving from source to transformed output?
- Infrastructure: where does the logic run, and how are schema, code, index, and job changes released safely?
Numbers That Matter
The Hourglass method tells you what to decide. The sizing math tells you whether the design is plausible. For powers of two, latency units, latency estimates, and availability targets, see Numbers That Matter in System Design.
System Design Scenarios
These scenarios illustrate how to apply the Hourglass Design Method to real-world systems, from IoT to social platforms to eCommerce.
Each scenario first names Data A and Data B, then the table follows the same structure: Block → Design Choice → Justification.
Scenario 1: Realtime Temperature Monitoring (IoT Sensors)
Goal: Build a system for 1M IoT devices reporting temperature every 10s across NSW.
- Realtime heatmap (~10s latency)
- Historical dashboard (daily/weekly/monthly)
- Retention: 6 months
Data A → Data B
- Data A: raw sensor readings:
device_id, timestamp, temperature, and device metadata. - Data B: live regional heatmap cells, historical aggregates, alertable trends, and dashboard-ready time-series views.
| Block | Design Choice | Justification |
|---|---|---|
| Source | One sensor publishes one reading over MQTT Single-device payload: { device_id, lat, lon, timestamp, temperature } |
Estimate one reading first: device_id (8 B) + lat/lon doubles (16 B) + temperature double (8 B) + timestamp (8 B) = ~40 B before overhead. JSON is often ~100-200 B because field names and punctuation are included. |
| Type | Structured time-series across 1M devices 1M devices / 10s = ~100K readings/sec Fixed schema with timestamp, location, and numeric temperature JSON at ingest → binary at storage |
Type scales the single reading: ~100K readings/sec × 86,400 sec ≈ 8.64B readings/day. At ~100 B each, raw ingest is ~864 GB/day before replicas and indexes. |
| Storage | Realtime latest-value table (~16 MB) Daily aggregation (~2.9 GB / 180 days) Metadata (~20 MB) Raw readings optional: much larger if retained |
Storage follows retention: latest-value storage stays tiny, aggregate storage is compact, but retaining all raw readings for 180 days would be ~155 TB before replicas/indexes. |
| Logic / Transformation | Realtime updates per reading Daily min/max aggregation Redis for fast compare Batch writes → TimescaleDB |
Low latency ingest + efficient aggregation |
| Pattern / Access | REST polling every 10s (map) REST queries (historical) |
Polling is simple, cost-efficient for low concurrency |
| Presentation | Web map grid updated every 10s Historical dashboard with calendar filter |
Lightweight visualization for end users |
| Security | No login API throttling (CloudFront + WAF) MQTT cert-based auth |
Basic protection, open data model |
| Scalability | ~100K writes/sec Kafka/Kinesis buffer Partition DB by device_id + time Stateless API, autoscaling |
Horizontal scalability and decoupling |
| Reliability | MQTT at-least-once Retry pipeline Re-runnable daily jobs |
Ensures data completeness under failure |
| Observability | Metrics: ingest rate, write latency, last_seen Logs: ingestion + API |
Full visibility into data pipeline health |
| Infrastructure | AWS IoT Core or EMQX → Kinesis/Kafka → ECS/Fargate → TimescaleDB IaC: Terraform/CDK |
Cloud-native, modular, reproducible |
Scenario 2: Twitter-like Microblogging Platform
Goal: Design a social platform similar to Twitter.
- Realtime feed updates
- Millions of posts/day
- Support search, hashtags, user timelines
Data A → Data B
- Data A: user actions: posts, likes, follows, replies, media uploads, hashtags, and mentions.
- Data B: personalized timelines, searchable posts, notifications, counters, and profile/user timeline views.
| Block | Design Choice | Justification |
|---|---|---|
| Source | One user emits one action: create post, like, follow, reply, or upload media Example post payload: { author_id, text, media_ids, created_at } |
Estimate one post first: IDs/timestamp are ~24 B, 280 chars of English UTF-8 is ~280 B, worst-case UTF-8 can be ~1.1 KB, and media pointers/counters add tens to hundreds of bytes. |
| Type | Mixed event stream across millions of users Post, like, follow, reply, and media events have related but different schemas Text fields need UTF-8 sizing; hashtags/mentions become indexed fields |
Type scales the event family: a post metadata row is roughly ~0.5-2 KB. At 10M posts/day and ~1 KB average, post metadata is ~10 GB/day before indexes. |
| Storage | OLTP DB (Postgres/CockroachDB) for metadata Object store (S3) for media ElasticSearch for search/index |
Storage separates workloads: if 10% of 10M posts include 1 MB media, media adds ~1 TB/day, which dominates metadata storage. |
| Logic / Transformation | Fanout service builds timelines Kafka for async event distribution |
Logic can multiply writes: 10M posts/day × 200 followers average ≈ 2B timeline write events/day, so fanout should be async. |
| Pattern / Access | REST (post, follow) WebSocket/GraphQL (feed updates) |
REST reliable for writes; streaming API for low-latency feeds |
| Presentation | Web + mobile apps Infinite scroll timeline, notifications |
Optimized UX for engagement |
| Security | OAuth2 login Rate limiting (API Gateway) WAF for spam |
Standard identity + abuse protection |
| Scalability | Sharded user/tweet DB CDN for media Async fanout to caches |
Ensures horizontal scale to millions of users |
| Reliability | Durable Kafka log Retry for writes Timeline cache fallback |
Feed always eventually consistent |
| Observability | Metrics: post latency, fanout lag Logs: auth, API, feed delivery |
Critical for SLO monitoring |
| Infrastructure | AWS: API Gateway + Lambda/ECS, DynamoDB/Postgres, S3, ElasticSearch | Mix of serverless + managed DB for scale |
Scenario 3: eCommerce Platform
Goal: Design a modern eCommerce system.
- Product catalog, cart, checkout
- User accounts, payments
- Scalable search + inventory
Data A → Data B
- Data A: product updates, browsing events, cart changes, checkout requests, inventory changes, and payment callbacks.
- Data B: searchable catalog pages, available inventory, cart state, confirmed orders, receipts, and fulfillment events.
| Block | Design Choice | Justification |
|---|---|---|
| Source | One producer emits one event: product update, product view, add-to-cart, checkout request, inventory update, or payment callback | Estimate one product record first: IDs and numeric fields are small, title is ~100 B, description can be ~1-4 KB, attributes JSON ~0.5-3 KB, and media URLs ~300 B-1 KB. |
| Type | Structured event families: product/catalog records, cart events, order commands, inventory mutations, payment callbacks Catalog is read-heavy; checkout/payment is correctness-heavy |
Type separates scale profiles: catalog rows are often ~2-10 KB, cart events are small and ephemeral, while checkout/payment events need strict validation. |
| Storage | RDBMS (Aurora/MySQL) for orders/payments DynamoDB for cart sessions S3 for product media |
Storage follows object type: 1M products × ~5 KB average ≈ 5 GB catalog metadata, while 1M × 3 images × 500 KB ≈ 1.5 TB media. |
| Logic / Transformation | Inventory service decrements stock Async order events via SNS/SQS |
Checkout math drives correctness: 10K checkout/min ≈ 167 checkout/sec, but browsing/search reads may be 100x higher. |
| Pattern / Access | REST (catalog, cart, order) GraphQL (flexible queries for product search) |
REST for critical workflows; GraphQL for frontend flexibility |
| Presentation | Web + mobile storefront Search, cart, checkout flows |
Responsive UX, optimized conversions |
| Security | OAuth2 login, MFA for admin PCI-DSS compliant payment handling WAF + Shield |
Protects sensitive user/payment data |
| Scalability | Autoscaling ALB/NLB ElasticSearch for catalog search CDN for static assets |
Handles traffic spikes during sales |
| Reliability | Multi-AZ RDS Order queue with DLQ Event replay for payments |
Ensures orders are never lost |
| Observability | Metrics: checkout latency, error rate Logs: API + payment gateway |
Monitors user impact and failures |
| Infrastructure | AWS: ALB + ECS, Aurora, DynamoDB, S3, ElasticSearch, CloudFront | Mix of managed + serverless services for resilience |
Scenario 4: Short URL Service (URL Shortener)
Goal: Map long URLs to short codes with low latency, high write QPS, and massive read QPS.
- Create short code, redirect instantly
- Unique codes, collision-resistant
- Analytics (clicks, geo, referrer)
Data A → Data B
- Data A: long URL, owner, optional alias/TTL, redirect request, and click metadata such as timestamp, referrer, device, and geo.
- Data B: short code mapping, low-latency redirect response, QR/shareable link, and aggregated click analytics.
| Block | Design Choice | Justification |
|---|---|---|
| Source | One client request creates a mapping or resolves a code Create payload: { long_url, owner_id, ttl }Redirect request: GET /{code} |
Estimate one mapping first: short code is 7 ASCII chars (~7 B), owner ID is ~8 B, timestamps/TTL ~16 B, and long URL is often ~100-2,000 chars. |
| Type | Two main data shapes: mapping records and click events Mapping records are small but durable; click events are append-only and much higher volume |
Type separates durable mappings from click logs: mapping rows are often ~200 B-2 KB, while click events with IP/referrer/user-agent/geo are often ~300 B-1 KB. |
| Storage | DynamoDB (PK=code) for mapping; S3 for logs | Storage separates hot lookups and analytics: 100M URLs × ~500 B ≈ 50 GB mapping metadata; 1B click logs/day × ~500 B ≈ 500 GB/day in logs. |
| Logic / Transformation | Code gen via base62/ULID; optional custom alias; async analytics (Kinesis) | Code-space math guides collision strategy: 62^7 ≈ 3.5T possible codes, enough for 100M URLs with a large safety margin. |
| Pattern / Access | REST + 301/302 redirect; rate-limits per owner | Browser-native redirect semantics; abuse protection |
| Presentation | Simple web console + CLI; QR export | Low-friction creation and sharing |
| Security | Auth (API keys/OAuth); domain allowlist; malware scanning | Prevents phishing/abuse; protects brand domains |
| Scalability | CloudFront → Lambda@Edge redirect cache; hot keys sharded | Edge-cached redirects minimize origin load/latency |
| Reliability | Multi-Region table (global tables); DLQ for failed writes | Regional failover; durable retry |
| Observability | Metrics: p50/p99 redirect latency, 4xx/5xx; click streams | Track UX and abuse; support analytics |
| Infrastructure | API Gateway + Lambda, DynamoDB, Kinesis, S3, CloudFront, WAF | Serverless, cost-efficient at any scale |
Scenario 5: Search Engine (Vertical Site/Search Service)
Goal: Index documents/webpages and provide full-text search with filters, ranking, and autosuggest.
- Ingest & crawl sources
- Index fields + vectors
- Query: keyword + semantic, filters, facets
Data A → Data B
- Data A: raw documents, webpages, titles, body text, metadata, facets, links, and uploaded files.
- Data B: searchable index entries, ranked result sets, highlighted snippets, facets, autosuggest terms, and vector-search candidates.
| Block | Design Choice | Justification |
|---|---|---|
| Source | One crawler fetch, webhook event, or batch upload produces one raw document or document update | Estimate one document first: document ID ~8-16 B, title ~100 B, body text often ~10 KB, facets/tags ~100-500 B, URL ~100-2,000 chars, metadata JSON ~0.5-2 KB. |
| Type | Document corpus across many sources Fields include title, body, URL, facets, metadata, language, and optional embedding |
Type scales one document into a corpus: use ~10-20 KB per raw document for rough sizing; fields support keyword, filter, ranking, and vector search. |
| Storage | OpenSearch/Elastic (inverted index) + vector index; S3 cold store | Storage can rival raw content: 100M docs × 10 KB ≈ 1 TB raw; search index may add ~300 GB-1 TB; 768-dim float vectors add ~307 GB before index overhead. |
| Logic / Transformation | ETL: clean, dedupe, tokenize, embed; incremental indexing | Higher relevance; fast refresh with partial updates |
| Pattern / Access | Search REST: q, filters, sort; autosuggest endpoint | Standard search UX; low-latency responses |
| Presentation | Web UI: search box, facets, highlighting; pagination | Discoverability and relevance feedback |
| Security | Signed requests; per-tenant filter; index-level RBAC | Isolation and least privilege |
| Scalability | Sharded indexes; warm replicas; query cache/CDN for hot queries | Throughput and low tail latency |
| Reliability | Multi-AZ cluster; snapshot to S3; blue/green index swaps | Safe reindex; fast recovery |
| Observability | Metrics: QPS, p99, recall@k/CTR; slow logs; relevancy dashboards | Quality and performance tuning |
| Infrastructure | ECS/EKS crawlers, Lambda ETL, OpenSearch, S3, API Gateway, CloudFront | Managed search + serverless ETL |
Scenario 6: Ride-Sharing (Dispatch & Matching)
Goal: Match riders ↔ drivers in real time with ETA estimates, pricing, and tracking.
- High write (location updates) + low-latency reads (nearby drivers)
- Geo-index + surge pricing
- Trip lifecycle events
Data A → Data B
- Data A: driver locations, rider pickup/dropoff requests, driver availability, trip events, traffic signals, and payment events.
- Data B: nearby driver candidates, match decisions, ETA, price quote, live trip state, route updates, and notifications.
| Block | Design Choice | Justification |
|---|---|---|
| Source | One mobile app emits one event: driver location update, rider request, driver accept, trip status update, or payment event | Estimate one location event first: driver ID ~8 B, lat/lon ~8-16 B, timestamp ~8 B, speed/heading/status ~10-30 B, request metadata ~20-100 B. |
| Type | Regional event stream across active riders and drivers Location events are high-frequency; trip/payment events are lower-frequency but more correctness-sensitive |
Type separates high-volume telemetry from durable lifecycle events: compact location events are ~60-150 B, while JSON events are often ~150-300 B. |
| Storage | Redis/KeyDB (geo sets) for live locations; Postgres for trips/payments; S3 for telemetry | Storage follows frequency: 100K active drivers / 5s = 20K location writes/sec; at ~150 B each ≈ 3 MB/sec, ~259 GB/day before replicas/log overhead. |
| Logic / Transformation | Stream (Kafka): location smoothing, ETA calc, surge pricing; ML for ETA/dispatch | Logic should stay regional: dispatch queries nearby drivers by city/zone rather than scanning the global location set. |
| Pattern / Access | REST: request/cancel trip, quote; WebSocket: live driver ETA/track | Seamless UX for requests + realtime updates |
| Presentation | Mobile map with live driver markers; push notifications | High-frequency updates with low battery impact |
| Security | JWT auth; signed location updates; fraud detection rules | Protects users and platform integrity |
| Scalability | Region-sharded dispatch; partition by city/zone; edge caches for maps/tiles | Reduces cross-region chatter; scales horizontally |
| Reliability | Leader election per region; idempotent trip ops; DLQs for events | Failover and consistent trip lifecycle |
| Observability | Metrics: match time, cancel rate, ETA error; traces for dispatch path | Operational and model quality monitoring |
| Infrastructure | API Gateway + ECS/EKS, Redis Geo, Kafka, Postgres/Aurora, S3, CloudFront, Pinpoint/SNS | Mix of in-memory geo + durable stores |