A retail site has highly predictable traffic that ramps up at 9 AM and falls at 6 PM daily, and users currently experience slow responses during the morning ramp. Which approach removes the latency gap most directly?

Scheduled or predictive scaling to provision capacity before 9 AM. Scheduled scaling (for known calendar patterns) and predictive scaling (ML-forecasted, up to 48 hours ahead) both add capacity before demand arrives, eliminating the ramp-up latency gap. Target tracking and step scaling are reactive and only respond after the metric has already risen.

An Auto Scaling group spans two AZs with min 2, desired 4, max 8. One AZ fails completely. What does Auto Scaling do?

Launches replacement instances in the healthy AZ to restore the desired count of 4. Auto Scaling detects the instances in the failed AZ as unhealthy and launches replacements in the remaining healthy AZ to maintain the desired count of 4. The load balancer routes all traffic to the healthy AZ, so the application keeps serving users.

A team must deploy a new AMI to an ASG without taking the service offline, replacing instances gradually. What is the correct sequence?

Create a new Launch Template version with the new AMI, set it default, then run an Instance Refresh with a minimum healthy percentage. Launch Templates are versioned. Creating a new version with the new AMI, making it the default, and starting an Instance Refresh with a minimum healthy percentage replaces instances batch by batch with health checks between batches, achieving a zero-downtime rollout.

EC2 Auto Scaling — Dynamic, Predictive, and | Free Guide 2026

Key Takeaways

An Auto Scaling group (ASG) maintains a min/desired/max instance count and self-heals by terminating unhealthy instances and launching replacements.
Target tracking is the simplest and AWS-recommended dynamic policy; step scaling reacts in graduated steps to alarm breach size; simple scaling is legacy.
Predictive scaling uses machine learning to forecast load up to 48 hours ahead and provisions before demand; scheduled scaling pre-provisions for known calendar events.
Launch Templates (versioned, support mixed instances and Spot) replace legacy Launch Configurations; Instance Refresh rolls out a new template with zero downtime.
Default health-check grace period is 300 seconds, and ELB health checks let the ASG replace instances that pass EC2 status checks but fail the application.

Quick Answer: Auto Scaling keeps capacity between a min and max, driving toward a desired count. Use target tracking for simple metric goals, step scaling for graduated reactions, scheduled scaling for known calendar spikes, and predictive scaling to pre-provision forecasted demand. Spread the ASG across multiple AZs for resilience.

Auto Scaling Group Fundamentals

An Auto Scaling group (ASG) manages a fleet of EC2 instances as one logical unit defined by three numbers and a Launch Template.

Parameter	Meaning	Example
Minimum	Floor — never go below	2
Desired	Target the ASG drives toward	4
Maximum	Ceiling — never exceed	10
Launch Template	AMI, instance type, security group, IAM role, user data	lt-0abc...
Subnets	AZs to place instances in	us-east-1a/1b/1c
Health check type	EC2 (status checks) or ELB (target health)	ELB

Self-healing is the headline resilience feature: if an instance fails its health check, the ASG terminates it, launches a replacement from the Launch Template, and registers the new instance with the load balancer — no human action. Set the health check type to ELB so the ASG also replaces instances that pass low-level EC2 status checks but fail the application health check on the target group. The default health-check grace period is 300 seconds, giving new instances time to boot before the first health check counts.

The Four Scaling Policy Types

1. Target Tracking (recommended)

You pick one metric and a target value; the ASG adds or removes instances to hold it. Example: keep Average CPU utilization at 50%, or ALBRequestCountPerTarget at 1000. AWS auto-creates the CloudWatch alarms and manages cooldown for you.

2. Step Scaling

Reacts in graduated steps to how badly an alarm is breached:

CPU breach	Action
50–70%	+1 instance
70–90%	+2 instances
>90%	+3 instances

3. Scheduled Scaling

Changes min/desired/max at a fixed time for known patterns — e.g., raise desired to 20 at 08:00 weekdays, drop to 4 at 20:00. Proactive, not metric-driven.

4. Predictive Scaling

Uses machine learning on up to two weeks of history to forecast load up to 48 hours ahead, retrains daily, and launches capacity before demand arrives — ideal for cyclical daily/weekly traffic. Simple scaling still exists but is legacy (it stalls on a fixed cooldown); avoid it on new designs.

On the Exam: "Users see slow responses every morning during the predictable ramp-up" → scheduled or predictive scaling, because reactive policies (target tracking, step) only act after the metric rises, leaving a latency gap.

Launch Templates and Instance Refresh

Feature	Launch Template	Launch Configuration
Status	Current	Legacy
Versioning	Yes	No
Mixed instance types	Yes	No
Spot + On-Demand mix	Yes	Limited

To roll out a new AMI safely, create a new template version, set it as default, then start an Instance Refresh with a minimum healthy percentage (e.g., 90%). The ASG replaces instances in batches, warming up each batch before the next — zero downtime. Do not edit instances in place.

Cooldowns, Warm-Up, and Cost

The scaling cooldown (default 300 seconds, used by simple scaling) blocks further actions until the previous one settles; target tracking uses instance warm-up instead. Mixed-instances policies with Spot Instances cut cost dramatically for fault-tolerant fleets, while On-Demand base capacity protects against Spot interruption. Lifecycle hooks let you pause launch or terminate to run bootstrap or drain logic.

Termination Policy, AZ Rebalancing, and Lifecycle Hooks

When the ASG scales in, the termination policy decides which instance dies. The default tries to keep AZs balanced, then removes the instance with the oldest Launch Template/Configuration, then the one closest to the next billing hour — you can override with custom policies. Availability Zone rebalancing is why the ASG sometimes briefly launches in one AZ before terminating in another after an AZ recovers: it restores even spread for resilience, accepting a short period above desired capacity.

Lifecycle hooks pause an instance in a Pending:Wait or Terminating:Wait state so you can run setup (install software, register with a config system) before it goes into service, or drain connections and ship logs before it terminates. The hook times out (default heartbeat) and then continues, so always send a CompleteLifecycleAction when done.

Scaling on the Right Metric

CPU is the textbook target-tracking metric, but it is often wrong. For a fleet behind an ALB, ALBRequestCountPerTarget scales directly on user load and avoids the lag of CPU. For queue workers, scale on SQS ApproximateNumberOfMessagesVisible (often via a target-tracking custom metric of backlog-per-instance). For memory-bound apps, publish a custom CloudWatch metric since EC2 does not expose memory natively.

Workload	Best scaling signal
Web fleet behind ALB	ALBRequestCountPerTarget
Queue consumers	SQS messages visible / backlog per instance
CPU-bound compute	Average CPU utilization
Memory-bound app	Custom memory metric via CloudWatch agent

High Availability Through the ASG

Resilience comes from spreading the ASG across at least two, ideally three, AZs and letting Auto Scaling restore the desired count after any AZ or instance failure. Combine this with an ALB doing health checks, ELB health-check type on the ASG, and stateless instances (session and data stored in DynamoDB, ElastiCache, RDS, or S3) so any instance can be replaced freely. That combination — multi-AZ ASG + ELB + externalized state — is the canonical "highly available, self-healing web tier" answer the SAA-C03 exam rewards, and it pairs directly with the disaster-recovery and Multi-AZ topics in the rest of this domain.

AWS Solutions Architect Associate

AWS Solutions Architect

2.2 EC2 Auto Scaling — Dynamic, Predictive, and Scheduled

Key Takeaways

Auto Scaling Group Fundamentals

The Four Scaling Policy Types

1. Target Tracking (recommended)

2. Step Scaling

3. Scheduled Scaling

4. Predictive Scaling

Launch Templates and Instance Refresh

Cooldowns, Warm-Up, and Cost

Termination Policy, AZ Rebalancing, and Lifecycle Hooks

Scaling on the Right Metric

High Availability Through the ASG

AWS Solutions Architect Associate

1Introduction

2Domain 1: Design Secure Architectures (30%)

3Domain 2: Design Resilient Architectures (26%)

4Domain 3: Design High-Performing Architectures (24%)

5Domain 4: Design Cost-Optimized Architectures (20%)

6VPC and Networking Deep Dive

7Migration, Transfer, and Hybrid Services

8Serverless Architecture and Application Services

9Advanced Topics and Exam Scenarios

AWS Solutions Architect

2.2 EC2 Auto Scaling — Dynamic, Predictive, and Scheduled

Key Takeaways

Auto Scaling Group Fundamentals

The Four Scaling Policy Types

1. Target Tracking (recommended)

2. Step Scaling

3. Scheduled Scaling

4. Predictive Scaling

Launch Templates and Instance Refresh

Cooldowns, Warm-Up, and Cost

Termination Policy, AZ Rebalancing, and Lifecycle Hooks

Scaling on the Right Metric

High Availability Through the ASG