2.2 EC2 Auto Scaling — Dynamic, Predictive, and Scheduled

Key Takeaways

  • An Auto Scaling group (ASG) maintains a min/desired/max instance count and self-heals by terminating unhealthy instances and launching replacements.
  • Target tracking is the simplest and AWS-recommended dynamic policy; step scaling reacts in graduated steps to alarm breach size; simple scaling is legacy.
  • Predictive scaling uses machine learning to forecast load up to 48 hours ahead and provisions before demand; scheduled scaling pre-provisions for known calendar events.
  • Launch Templates (versioned, support mixed instances and Spot) replace legacy Launch Configurations; Instance Refresh rolls out a new template with zero downtime.
  • Default health-check grace period is 300 seconds, and ELB health checks let the ASG replace instances that pass EC2 status checks but fail the application.
Last updated: June 2026

Quick Answer: Auto Scaling keeps capacity between a min and max, driving toward a desired count. Use target tracking for simple metric goals, step scaling for graduated reactions, scheduled scaling for known calendar spikes, and predictive scaling to pre-provision forecasted demand. Spread the ASG across multiple AZs for resilience.

Auto Scaling Group Fundamentals

An Auto Scaling group (ASG) manages a fleet of EC2 instances as one logical unit defined by three numbers and a Launch Template.

ParameterMeaningExample
MinimumFloor — never go below2
DesiredTarget the ASG drives toward4
MaximumCeiling — never exceed10
Launch TemplateAMI, instance type, security group, IAM role, user datalt-0abc...
SubnetsAZs to place instances inus-east-1a/1b/1c
Health check typeEC2 (status checks) or ELB (target health)ELB

Self-healing is the headline resilience feature: if an instance fails its health check, the ASG terminates it, launches a replacement from the Launch Template, and registers the new instance with the load balancer — no human action. Set the health check type to ELB so the ASG also replaces instances that pass low-level EC2 status checks but fail the application health check on the target group. The default health-check grace period is 300 seconds, giving new instances time to boot before the first health check counts.

The Four Scaling Policy Types

1. Target Tracking (recommended)

You pick one metric and a target value; the ASG adds or removes instances to hold it. Example: keep Average CPU utilization at 50%, or ALBRequestCountPerTarget at 1000. AWS auto-creates the CloudWatch alarms and manages cooldown for you.

2. Step Scaling

Reacts in graduated steps to how badly an alarm is breached:

CPU breachAction
50–70%+1 instance
70–90%+2 instances
>90%+3 instances

3. Scheduled Scaling

Changes min/desired/max at a fixed time for known patterns — e.g., raise desired to 20 at 08:00 weekdays, drop to 4 at 20:00. Proactive, not metric-driven.

4. Predictive Scaling

Uses machine learning on up to two weeks of history to forecast load up to 48 hours ahead, retrains daily, and launches capacity before demand arrives — ideal for cyclical daily/weekly traffic. Simple scaling still exists but is legacy (it stalls on a fixed cooldown); avoid it on new designs.

On the Exam: "Users see slow responses every morning during the predictable ramp-up" → scheduled or predictive scaling, because reactive policies (target tracking, step) only act after the metric rises, leaving a latency gap.

Launch Templates and Instance Refresh

FeatureLaunch TemplateLaunch Configuration
StatusCurrentLegacy
VersioningYesNo
Mixed instance typesYesNo
Spot + On-Demand mixYesLimited

To roll out a new AMI safely, create a new template version, set it as default, then start an Instance Refresh with a minimum healthy percentage (e.g., 90%). The ASG replaces instances in batches, warming up each batch before the next — zero downtime. Do not edit instances in place.

Cooldowns, Warm-Up, and Cost

The scaling cooldown (default 300 seconds, used by simple scaling) blocks further actions until the previous one settles; target tracking uses instance warm-up instead. Mixed-instances policies with Spot Instances cut cost dramatically for fault-tolerant fleets, while On-Demand base capacity protects against Spot interruption. Lifecycle hooks let you pause launch or terminate to run bootstrap or drain logic.

Termination Policy, AZ Rebalancing, and Lifecycle Hooks

When the ASG scales in, the termination policy decides which instance dies. The default tries to keep AZs balanced, then removes the instance with the oldest Launch Template/Configuration, then the one closest to the next billing hour — you can override with custom policies. Availability Zone rebalancing is why the ASG sometimes briefly launches in one AZ before terminating in another after an AZ recovers: it restores even spread for resilience, accepting a short period above desired capacity.

Lifecycle hooks pause an instance in a Pending:Wait or Terminating:Wait state so you can run setup (install software, register with a config system) before it goes into service, or drain connections and ship logs before it terminates. The hook times out (default heartbeat) and then continues, so always send a CompleteLifecycleAction when done.

Scaling on the Right Metric

CPU is the textbook target-tracking metric, but it is often wrong. For a fleet behind an ALB, ALBRequestCountPerTarget scales directly on user load and avoids the lag of CPU. For queue workers, scale on SQS ApproximateNumberOfMessagesVisible (often via a target-tracking custom metric of backlog-per-instance). For memory-bound apps, publish a custom CloudWatch metric since EC2 does not expose memory natively.

WorkloadBest scaling signal
Web fleet behind ALBALBRequestCountPerTarget
Queue consumersSQS messages visible / backlog per instance
CPU-bound computeAverage CPU utilization
Memory-bound appCustom memory metric via CloudWatch agent

High Availability Through the ASG

Resilience comes from spreading the ASG across at least two, ideally three, AZs and letting Auto Scaling restore the desired count after any AZ or instance failure. Combine this with an ALB doing health checks, ELB health-check type on the ASG, and stateless instances (session and data stored in DynamoDB, ElastiCache, RDS, or S3) so any instance can be replaced freely. That combination — multi-AZ ASG + ELB + externalized state — is the canonical "highly available, self-healing web tier" answer the SAA-C03 exam rewards, and it pairs directly with the disaster-recovery and Multi-AZ topics in the rest of this domain.

Test Your Knowledge

A retail site has highly predictable traffic that ramps up at 9 AM and falls at 6 PM daily, and users currently experience slow responses during the morning ramp. Which approach removes the latency gap most directly?

A
B
C
D
Test Your Knowledge

An Auto Scaling group spans two AZs with min 2, desired 4, max 8. One AZ fails completely. What does Auto Scaling do?

A
B
C
D
Test Your Knowledge

A team must deploy a new AMI to an ASG without taking the service offline, replacing instances gradually. What is the correct sequence?

A
B
C
D