AWS Compute Services

1. AWS Compute Overview

Professional-level design questions expect you to remember that “compute” isn’t just EC2 boxes. AWS provides a toolkit and expects architects to combine pieces that best fit the workload’s shape.

Virtual servers (EC2) – still the go-to when full OS control, custom AMIs, or exotic networking is required.
Containers (ECS, EKS, Fargate) – great for repeatable deployments and for sprinkling Spot capacity into stateless services.
Serverless functions (Lambda) – perfect glue for events and bursty jobs because idle time literally costs nothing.
Edge + load balancing – Route 53, CloudFront, ALB/NLB/GWLB, and Global Accelerator decide where the traffic lands and how TLS is handled.
Fleet automation – Auto Scaling Groups, launch templates, Parameter Store, and Systems Manager keep fleets patched without manual babysitting.

The remaining sections follow a logical progression: start with regional entry points, cover cost levers, tighten up networking, then finish with scaling and inline security.

2. Regional and Global Architecture

High-level designs usually start at the regional entry point and then trace how traffic fans out globally. Key checkpoints:

Global reach – place workloads close to customers via Regions, edge locations, and CDN nodes.
Health-aware routing – global health checks and failover policies keep endpoints resilient.
Scalable ingress – use regional load balancers or Global Accelerator to absorb demand before it reaches your apps.
Content acceleration – pair compute tiers with CloudFront so static and dynamic responses stay fast worldwide.

🌍 Regional entry → global delivery path

3. EC2 Purchase & Savings Strategies

After the region/edge layout is clear, cost strategy becomes the next focus. Picking the right commercial model matters just as much as sizing instances, so the tables below act as a quick reference for when each option shines.

3.1 Standard Purchase Options

Model	Billing Traits	Best For	Key Watchouts
On-Demand	Per-second billing on shared hardware, no commitments.	Default choice for short-lived, unpredictable, or interruption-intolerant tasks.	Most expensive rate; capacity not reserved.
Spot	Per-second billing, deep discounts (up to ~90%) on unused capacity.	Stateless, batch, or flexible jobs that can handle sudden interruptions.	Instances can disappear with little warning; design retry logic.
Reserved Instances	Commit to 1 or 3 years for reduced hourly rates; zonal RIs also reserve capacity.	Always-on workloads with predictable instance families/regions.	Locked to attributes (region, tenancy, family). Unused reservations still bill.

3.2 Dedicated Capacity Choices

Option	What You Get	Ideal Use	Notes
Dedicated Host	Entire physical server under your control; BYOL-friendly.	Socket/core-licensed software, strict compliance, pinning instances.	You manage host capacity; pair with Host Affinity for placement guarantees.
Dedicated Instance	Instance runs on hardware isolated to your account, but AWS manages hosts.	Isolation requirements without host-level management.	No host visibility; pay instance fees with dedicated tenancy surcharge.

🧩 EC2 tenancy isolation options

Quick note to future me: a host is the single physical server that runs the VMs, while a rack is the collection of hosts an AWS placement group carves up.

3.3 Reserved Instance Flavors

Scheduled RIs – Reserve recurring blocks (e.g., nightly batch for 5 hours). Minimum 1-year contract (~1,200 hrs/year). Not every region or family supported.
Regional RIs – Apply discounts to matching instances in any AZ of the region, but do not reserve capacity.
Zonal RIs – Attach to one AZ; gain both billing discount and capacity reservation.
On-Demand Capacity Reservations – Guarantee capacity inside an AZ without term commitments. Pair with RIs or Savings Plans for discounts when possible.

3.4 Savings Plans

Commit to spend $X/hour for 1 or 3 years to unlock reduced rates.
Compute Savings Plans apply across EC2, Fargate, and Lambda, regardless of instance family or region.
EC2 Instance Savings Plans limit flexibility (family + region) but offer deeper discounts.

3.5 Taxi Analogy for Charging Models

Plan	Plain-English Explanation	Taxi Analogy 🚕	Commitment	Capacity Guaranteed?	Discount?	When to Use
On-Demand	Pay only while the instance runs.	Hail a taxi whenever needed.	None	❌	❌	Tests, spikes, unpredictable workloads.
Savings Plan	Promise to spend $/hr for discount, flexible usage.	Retainer for any taxi ride at lower price.	1–3 yrs	❌	✅	Continuous usage that may change families/regions.
Reserved Instance	Lock specific attributes for big discount.	Lease the same taxi full-time.	1–3 yrs	❌ (✅ for zonal)	✅	Steady 24×7 fleets.
Scheduled RI	Recurring reserved blocks.	Pre-book the same taxi for rush hour.	1 yr	✅ (during window)	✅	Predictable periodic jobs.
Capacity Reservation	Keep capacity idle but ready.	Pay driver to wait while you shop.	Open-ended	✅	❌ (unless paired)	Mission-critical or DR kickoffs.

3.6 Dedicated Options Recap

Option	Analogy	Discount Potential	Why choose it
Dedicated Instance	Private taxi, company-owned.	❌	Compliance isolation without hardware management.
Dedicated Host	Lease the whole car.	✅ (with RIs/SP)	BYOL licensing, socket/core tracking, placement control.

4. EC2 Networking & Image Strategy

Cost planning is pointless if instances can’t talk or boot quickly, so the next pass is all about ENIs and image hygiene.

4.1 Elastic Network Interfaces (ENIs)

Scoped to an AZ: an ENI lives in a subnet, so you can only attach it to instances in the same AZ.
Static identity: primary private IPv4 never changes for the lifetime of the ENI.
Addresses: support multiple secondary IPv4s, IPv6 assignments, and an Elastic IP per private IPv4 if needed.
Security constructs: attach multiple security groups, leverage source/destination checks, and gain a unique MAC address (handy for licensed software).
Attachment patterns: add multiple ENIs per instance (subject to type limits) for different security zones or management planes.

4.2 Bootstrapping vs. AMI Baking

Application provisioning typically follows three layers:

Base OS + Dependencies – slowest layer, rarely changes.
Application Binaries – updated occasionally.
Runtime Configuration – fast, environment-specific tweaks.

Strategies:

User Data scripts – customize step 1 or 2 at launch time; flexible but slower to become ready.
AMI Baking – pre-build golden images with the heavy lifting done; fastest boot but requires pipeline updates when components change.
Dynamic config (Parameter Store / AppConfig) – keep step 3 lightweight and environment-aware.

Optimal builds usually blend baked AMIs for heavy dependencies plus minimal bootstrapping for secrets, region settings, or last-minute patches.

5. Elastic Load Balancing

With networking squared away, the next layer is the load balancers that shape every request before it hits compute.

5.1 Architecture & Traffic Flow

Deploy load balancers across at least two subnets (one per AZ) to decouple presentation and application tiers.
Each ELB exposes a DNS name; clients resolve it to per-AZ nodes.
Listeners inspect protocol, port, and rule sets, then forward to target groups over private networking.
Targets respond via the node, while health checks continuously verify status.
Allocate /27-sized subnets (8+ free IPs) so nodes can scale out when demand spikes.

⚖️ ELB request flow (public + internal tiers)

5.2 Cross-Zone Load Balancing

Default behavior is per-AZ balancing – each AZ receives an equal portion of traffic regardless of how many instances run inside it.
Enable cross-zone to redistribute surplus requests from lightly provisioned AZs to others, preventing single-instance AZs from being overloaded.

🔁 Cross-zone load balancing fan-out

5.3 User Session State

Stateful features (shopping carts, wizards, login state) should ideally live in shared stores (Redis, DynamoDB) so any node can serve the next request.
If server memory must hold state, use stickiness cautiously and plan for disruption during instance rotations.

5.4 Load Balancer Generations

Type	Protocols	Highlights	Limitations
Classic LB (CLB)	HTTP/HTTPS/TCP	Legacy option, basic Layer 4/7 support.	No SNI, only one SSL cert, minimal app awareness.
Application LB (ALB)	HTTP/HTTPS/WebSocket	True Layer 7: content-based routing, multiple certs, WebSockets.	No TCP/UDP listeners; TLS terminates at ALB.
Network LB (NLB)	TCP/TLS/UDP	Ultra-low latency, static IPs, preserves end-to-end encryption.	No HTTP header visibility or cookies.
Gateway LB (GWLB)	GENEVE encapsulated L3/L4	Auto-scales third-party appliances inline.	Requires appliance support for GENEVE.

5.5 Session Stickiness

ALB/CLB can inject cookies (e.g., AWSALB) lasting 1 second to 7 days to bind a client to a target.
Stickiness helps when external caching isn’t available but can skew load or trap users on draining instances.

5.6 Connection Draining & Deregistration Delay

Connection Draining (CLB) – allows 1–3600 seconds for in-flight requests to finish before removing a node.
Deregistration Delay (ALB/NLB) – configured per target group (default 300s) so existing flows complete gracefully after a scale-in event.

5.7 Client IP Preservation

For HTTP/HTTPS, ALB adds X-Forwarded-For, X-Forwarded-Proto, and X-Forwarded-Port headers; read the leftmost IP to find the original client.
For non-HTTP traffic, enable the Proxy Protocol (Layer 4). CLB uses v1 (text), NLB uses v2 (binary) to deliver client source info while keeping TLS intact.

6. Auto Scaling Groups (ASG)

All that front-door engineering falls flat without an automated fleet, so this section collects focused ASG reminders.

6.1 Core Concepts

Use launch templates (preferred) or launch configurations to define AMI, networking, storage, and metadata.
Set min, desired, and max capacity to bound fleet size.
ASGs self-heal: failed instances trigger replacement up to desired capacity.
Operate only inside a VPC but can span multiple subnets/AZs for resilience.

⚙️ ASG lifecycle hook timeline

6.2 Scaling Policies

Manual – operators change desired count directly (no policy required).
Scheduled – scale for predictable daily/weekly events.
Dynamic – react to CloudWatch metrics:
- Simple scaling: single threshold → fixed adjustment.
- Step scaling: multiple thresholds map to incremental adjustments.
- Target tracking: maintain a metric near a setpoint (thermostat-style).
SQS-driven – trigger scaling from queue depth (e.g., ApproximateNumberOfMessagesVisible).
Cooldowns prevent thrashing; customize per policy.

TARGET_CPU = 50

while True:
    avg = get_average_cpu(asg_instances)
    error = avg - TARGET_CPU

    if error > 10 and desired_instances < max_instances:
        scale_out(by=1)
    elif error < -10 and desired_instances > min_instances:
        scale_in(by=1)

    cooldown(120)
    sleep(60)

Policy	Behavior	Notes
Simple	One metric threshold, one action.	“If CPU > 70%, add 1 instance.”
Step	Multiple bands with different step sizes.	“>90% add 3, >70% add 2…”
Target Tracking	Feedback loop aims for target metric.	Works like cruise control; AWS manages math.

6.3 Health Checks & Hooks

Health sources: EC2 status (default), ELB health, or custom integration.
Grace period: default 300s before first health check to allow bootstrapping.
Lifecycle hooks pause instances during Pending or Terminating steps so you can run custom scripts (e.g., pre-warm cache, drain sessions). Resume via CompleteLifecycleAction or wait for timeout.

6.4 ASGs with Load Balancers

Register ASG with target groups; new instances join automatically.
Optionally rely on load balancer health checks so failing targets are replaced even if EC2 reports “ok.”
Deregistration delays + lifecycle hooks create graceful scale-in workflows.

7. EC2 Placement Groups

When latency or blast-radius guarantees show up, placement groups keep the topology honest.

Strategy	Layout	Best For	Constraints
Cluster	Packs instances close together in one AZ.	HPC, tightly coupled apps needing 10 Gbps+ per flow.	No cross-AZ spreading; low resilience. Launch similar instances simultaneously for best results.
Spread	Places up to 7 instances per AZ on distinct racks with separate power/network.	Small fleets needing maximum isolation (critical services, HA pairs).	Not available for Dedicated Hosts/Instances; limit 7 per AZ.
Partition	Segments racks into partitions; instances within a partition share hardware, but partitions are isolated.	Large-scale distributed systems (Hadoop, Cassandra, Kafka).	Up to 7 partitions per AZ; you assign instances to partitions for failure-domain control.

🧱 Placement group rack layout

8. Gateway Load Balancer (GWLB)

The final piece is inline security. GWLB keeps third-party appliances elastic so next hops are not hard-coded everywhere.

Traditional inspection tiers often require manually chaining firewalls or IDS appliances, creating scaling pain. GWLB changes that by:

Planting GWLB Endpoints (PrivateLink) inside each subnet, so routing can hairpin through inspection with a single hop.
Wrapping flows in GENEVE and shipping them to a managed GWLB service that knows about health, scaling, and AZ isolation.
Treating appliance fleets like any other target group, which means health checks, scaling, and fail-open/fail-closed logic stay consistent.
Preserving the original source/destination IPs so the security rules stay accurate after inspection.

🛡️ GWLB inline inspection flow

Use cases:

Centralized security VPC – route spoke VPC traffic via GWLB endpoints for inspection before reaching workloads.
Inline egress filtering – send outbound flows through IDS/IPS without hard-wiring next hops.
Scalable appliances – autoscale firewall AMIs with GWLB controlling attachment and health.

8. Operations & Exam Notes

EC2 lifecycle – Reboot keeps the same host and IPs; Stop/Start moves to new hardware with a new public IP (unless EIP) and wipes instance store volumes.
ENIs – secondary ENIs retain MAC/IP identity across stops; useful for licensing or failover IP swaps.
ALB refresh – regional service with nodes (ENIs) in each subnet; stickiness is configured at the Target Group, not on the listener.
Lambda@Edge – deploys Lambda functions to CloudFront PoPs for viewer/origin request/response manipulation (auth headers, redirects, AB tests).
ECS networking modes – bridge (containers NAT behind host) vs awsvpc (task receives its own ENI, required for Fargate/Service Discovery).
Systems Manager Run/Session Manager – requires SSM Agent plus the AmazonSSMManagedInstanceCore role and VPC endpoints (ssm, ec2messages, ssmmessages) when no Internet access.
Elastic Beanstalk deployments – All-at-once (downtime), Rolling, Rolling with extra batch, Immutable, or Blue/Green (CNAME swap). Worker environments use SQS + EC2.
Elastic Fabric Adapter (EFA) – HPC ENI with OS-bypass (Libfabric). Needs supported instances (e.g., c5n, p4d, hpc6a) + placement group + EFA-enabled AMI.
Spot Fleet – mix On-Demand and Spot; set baseline On-Demand, choose allocation strategy (lowest price/diversified/capacity-optimized) to hit target capacity at lower cost.
Lambda concurrency – Regions allow bursts from hundreds up to tens of thousands; request service quota increases for sustained concurrency.