AWS Architecture Best Practices

Concise, practical notes for designing AWS systems that are reliable, secure, and cost-efficient — aligned with AWS Solutions Architect Professional standards.

1. Network Architecture

1.1 Designing a VPC

Think of your VPC as a city — each subnet is a district with its own purpose.

Subnet Type	Purpose	Example Components
Public	Internet-facing resources	ALB, NAT Gateway, Bastion Host
Private	Internal app and API layers	EC2, ECS Tasks
Database	Isolated storage layer	RDS, Aurora
Management	Monitoring and admin tools	Prometheus, Grafana

Example layout:

/16 VPC (65,536 IPs)
├── /20 public subnets – across 3 AZs
├── /20 private subnets – across 3 AZs
├── /24 database subnets – across 3 AZs
└── /24 management subnets – across 3 AZs

Design tips:

Use at least two AZs (three preferred).
Keep cross-AZ traffic low (reduces latency and cost).
Use NAT Gateways for private subnet outbound access.
Use VPC Endpoints to reach AWS services privately.

1.2 Network Security

Control	Scope	Behavior
Security Groups	Instance-level	Stateful, only “allow” rules.
NACLs	Subnet-level	Stateless, supports allow and deny.

Additional practices:

Transit Gateway → central routing for multiple VPCs.
PrivateLink / VPC Endpoints → avoid exposing services to the internet.

2. High Availability (HA)

2.1 Application HA

ALB (L7) → smart routing, SSL termination, sticky sessions.
NLB (L4) → static IPs, ultra-low latency.
Auto Scaling → multi-AZ distribution, health checks, rolling updates.

2.2 Database HA

RDS Multi-AZ → synchronous standby and failover.
Read Replicas → async scaling for reads.
RDS Proxy → efficient connection pooling.
App-level retry and DNS failover logic recommended.

3. Redundancy & Disaster Recovery

3.1 Storage Redundancy

S3 → cross-region replication.
EBS → snapshots and lifecycle policies.
EFS → multi-AZ replication.
RDS → automated backups (7–35 days).

3.2 DR Strategies

Strategy	Description	RTO/RPO
Backup & Restore	Rebuild infrastructure from backups	High
Pilot Light	Minimal standby infrastructure	Medium
Warm Standby	Scaled-down live copy	Low
Multi-site Active	Full duplication across regions	Very Low (highest cost)

4. Security Architecture

4.1 Identity & Access

Use IAM roles, not static keys.
Enforce MFA and least-privilege policies.
Use AssumeRole for cross-account access.
Leverage service-linked roles for AWS services.

4.2 Encryption

At rest → S3, EBS, RDS, EFS encryption.
In transit → TLS 1.2 or higher.
KMS → centralized key management.
ACM → automatic certificate management.

5. Performance Optimization

Use Graviton instances or right-size with Compute Optimizer.
Prefer GP3 over GP2 for EBS.
Apply S3 lifecycle rules → move cold data to Glacier.
Reduce data transfer costs by staying in-region and private.

6. Cost Optimization

Continuously right-size using CloudWatch + Compute Optimizer.
Mix On-Demand, Reserved, and Spot instances.
Archive cold data with S3 Glacier.
Set up Budgets, Cost Explorer, and tagging for governance.

7. Monitoring & Observability

CloudWatch → metrics, alarms, dashboards.
Logs Insights → query logs across groups.
X-Ray → distributed tracing.
Synthetics → proactive canary checks.

8. Deployment & Operations

Define infrastructure as code with CloudFormation or CDK.
Automate pipelines via CodePipeline, CodeBuild, CodeDeploy.
Deployment strategies:
- Blue/Green → minimal downtime
- Canary → gradual rollout
- Rolling → phased updates
- Immutable → brand-new instances

9. Common Patterns

Pattern	AWS Services	Notes
Microservices	API Gateway + ECS/EKS + SQS/SNS	Async, scalable design
Serverless	Lambda + API Gateway	Pay per use
Event-Driven	SQS, SNS, EventBridge	Decoupled services

10. Design Principles (Rules of Thumb)

10.1 Availability Zones (AZs)

Nominal AZs = the number of AZs actively used in your architecture, leaving one buffer AZ for fault tolerance.

Formula:

Nominal AZs = Total AZs - 1
Instances per AZ = Required Instances ÷ Nominal AZs

Example:

You’re in a region with 6 AZs and your app needs 5 EC2 instances.

Nominal AZs = 6 - 1 = 5
Instances per AZ = 5 ÷ 5 = 1 instance per AZ

If one AZ fails, your app still runs evenly across 4 remaining AZs — maintaining stability and availability.

10.2 Subnets per Tier

Subnets = (Number of Tiers) × (Number of AZs)

Example:

2 tiers (app + DB) × 3 AZs = 6 subnets

10.3 Tiering Logic

Traditional → 3-tier (Presentation / Logic / Data).
In AWS → be requirements-driven.
Private DB subnets → isolate for compliance and security.
More subnets = better control, not automatically higher HA.

11. Best Practices Summary

Principle	Why It Matters
Design for failure	Expect AZ or instance failure — assume things will break and plan redundancy.
Implement elasticity	Scale with demand using Auto Scaling and managed services.
Automate with IaC	Use CloudFormation or CDK to reduce manual errors and enforce consistency.
Monitor everything	Visibility ensures reliability — track metrics, logs, and alarms proactively.
Optimize for cost	Efficiency drives sustainability — right-size, schedule, and review usage regularly.

Common pitfalls:

Running only in one AZ → single point of failure.
Over-provisioning → wasted cost.
Security added late → higher risk.
No observability → blind troubleshooting.
Tight coupling → poor scalability.