5.4 Architecture Best Practices and Design Principles

Key Takeaways

  • Design for failure: assume everything will fail and architect for automatic recovery using Multi-AZ, Auto Scaling, and health checks.
  • Loose coupling means components interact through well-defined interfaces (APIs, queues) so failures do not cascade.
  • Horizontal scaling (adding more instances) is preferred over vertical scaling (getting a bigger instance) for reliability.
  • Serverless architectures remove the operational burden of managing servers and scale automatically with demand.
  • The AWS Well-Architected Tool helps you review workloads against best practices from the six pillars.
Last updated: March 2026

Architecture Best Practices and Design Principles

Design for Failure

"Everything fails, all the time." — Werner Vogels, CTO of Amazon

This philosophy drives AWS architecture best practices. Build systems that expect and handle failures gracefully.

Strategies for Fault Tolerance

StrategyImplementation
Multi-AZ DeploymentDeploy resources across multiple AZs for resilience
Auto ScalingAutomatically replace failed instances
Health ChecksELB and Route 53 detect and route away from unhealthy resources
Backup and RecoveryAutomated backups, snapshots, cross-Region replication
Decoupled ArchitectureUse SQS/SNS so component failures do not cascade

Loose Coupling

Loose coupling means components interact through well-defined interfaces (APIs, message queues, event buses) rather than direct dependencies. If one component fails, others continue to function.

Tightly Coupled vs. Loosely Coupled

Tightly CoupledLoosely Coupled
Components directly call each otherComponents communicate via queues or APIs
Failure in one = failure in allFailure in one is isolated
Scaling requires scaling all componentsComponents scale independently
Changes require coordinationComponents can be updated independently

AWS services for loose coupling:

  • Amazon SQS — Message queues between components
  • Amazon SNS — Pub/sub notifications
  • Amazon EventBridge — Event-driven architecture
  • AWS Step Functions — Workflow orchestration
  • Amazon API Gateway — API interface between components

Elasticity and Scalability

Horizontal vs. Vertical Scaling

Horizontal Scaling (Scale Out/In)Vertical Scaling (Scale Up/Down)
MethodAdd/remove more instancesGet a bigger/smaller instance
LimitVirtually unlimitedLimited by largest instance type
DowntimeNone (add instances behind load balancer)Often requires restart
AvailabilityBetter (multiple instances = no SPOF)Single point of failure
Example4 x t3.large instead of 1 x m5.4xlargeUpgrade from t3.large to m5.4xlarge

On the Exam: AWS generally favors horizontal scaling because it is more resilient (no single point of failure) and offers virtually unlimited scaling. Questions about scalability and availability usually point toward horizontal scaling + load balancing.


Serverless Architecture

Serverless means you do not manage any servers. AWS handles provisioning, scaling, patching, and availability.

Common Serverless Architecture

ComponentService
API endpointAPI Gateway
Business logicLambda
Data storageDynamoDB
File storageS3
AuthenticationCognito
NotificationsSNS
WorkflowStep Functions

Benefits of serverless:

  • No server management
  • Automatic scaling
  • Pay-per-use pricing
  • Built-in high availability
  • Reduced operational overhead

High Availability Patterns

Multi-AZ Architecture

Deploy resources across multiple AZs within a Region:

  • ELB distributes traffic across AZs
  • EC2 instances in multiple AZs
  • RDS Multi-AZ for database failover
  • S3 automatically stores data across AZs

Multi-Region Architecture

For the highest level of fault tolerance and disaster recovery:

  • Route 53 failover routing between Regions
  • S3 Cross-Region Replication
  • DynamoDB Global Tables
  • Aurora Global Database

Disaster Recovery Strategies

StrategyRTO/RPOCostDescription
Backup & RestoreHours$Back up data, restore when needed
Pilot Light10s of minutes$$Core infrastructure running, scale up when needed
Warm StandbyMinutes$$$Scaled-down version running, scale up when needed
Multi-Site Active/ActiveNear zero$$$$Full production in multiple Regions

On the Exam: Know the four DR strategies and their tradeoffs. Backup & Restore is cheapest but slowest. Multi-Site is fastest but most expensive.


Key Architecture Questions for the Exam

Question PatternThink About
"Most highly available"Multi-AZ, Auto Scaling, ELB
"Most cost-effective"Right-sizing, Savings Plans, Spot, serverless
"Most secure"Least privilege, encryption, private subnets
"Decouple components"SQS, SNS, EventBridge
"Reduce operational overhead"Managed services, serverless
"Global low latency"CloudFront, Route 53 latency routing, Global Accelerator
Test Your Knowledge

Which of the following is an example of loose coupling in cloud architecture?

A
B
C
D
Test Your Knowledge

Why does AWS generally recommend horizontal scaling over vertical scaling?

A
B
C
D
Test Your Knowledge

Which disaster recovery strategy provides the LOWEST cost but the LONGEST recovery time?

A
B
C
D
Test Your Knowledge

A company wants to build a REST API with no server management, automatic scaling, and pay-per-request pricing. Which combination of services should they use?

A
B
C
D
Test Your KnowledgeMulti-Select

Which TWO strategies help achieve high availability for a web application on AWS? (Select TWO)

Select all that apply

Deploy all resources in a single Availability Zone
Use Elastic Load Balancing to distribute traffic across multiple AZs
Use the largest possible EC2 instance type
Deploy EC2 instances across multiple Availability Zones with Auto Scaling
Store all data on EC2 Instance Store