2.2 EC2 Auto Scaling — Dynamic, Predictive, and Scheduled
Key Takeaways
- An Auto Scaling group (ASG) maintains a min/desired/max instance count and self-heals by terminating unhealthy instances and launching replacements.
- Target tracking is the simplest and AWS-recommended dynamic policy; step scaling reacts in graduated steps to alarm breach size; simple scaling is legacy.
- Predictive scaling uses machine learning to forecast load up to 48 hours ahead and provisions before demand; scheduled scaling pre-provisions for known calendar events.
- Launch Templates (versioned, support mixed instances and Spot) replace legacy Launch Configurations; Instance Refresh rolls out a new template with zero downtime.
- Default health-check grace period is 300 seconds, and ELB health checks let the ASG replace instances that pass EC2 status checks but fail the application.
Quick Answer: Auto Scaling keeps capacity between a min and max, driving toward a desired count. Use target tracking for simple metric goals, step scaling for graduated reactions, scheduled scaling for known calendar spikes, and predictive scaling to pre-provision forecasted demand. Spread the ASG across multiple AZs for resilience.
Auto Scaling Group Fundamentals
An Auto Scaling group (ASG) manages a fleet of EC2 instances as one logical unit defined by three numbers and a Launch Template.
| Parameter | Meaning | Example |
|---|---|---|
| Minimum | Floor — never go below | 2 |
| Desired | Target the ASG drives toward | 4 |
| Maximum | Ceiling — never exceed | 10 |
| Launch Template | AMI, instance type, security group, IAM role, user data | lt-0abc... |
| Subnets | AZs to place instances in | us-east-1a/1b/1c |
| Health check type | EC2 (status checks) or ELB (target health) | ELB |
Self-healing is the headline resilience feature: if an instance fails its health check, the ASG terminates it, launches a replacement from the Launch Template, and registers the new instance with the load balancer — no human action. Set the health check type to ELB so the ASG also replaces instances that pass low-level EC2 status checks but fail the application health check on the target group. The default health-check grace period is 300 seconds, giving new instances time to boot before the first health check counts.
The Four Scaling Policy Types
1. Target Tracking (recommended)
You pick one metric and a target value; the ASG adds or removes instances to hold it. Example: keep Average CPU utilization at 50%, or ALBRequestCountPerTarget at 1000. AWS auto-creates the CloudWatch alarms and manages cooldown for you.
2. Step Scaling
Reacts in graduated steps to how badly an alarm is breached:
| CPU breach | Action |
|---|---|
| 50–70% | +1 instance |
| 70–90% | +2 instances |
| >90% | +3 instances |
3. Scheduled Scaling
Changes min/desired/max at a fixed time for known patterns — e.g., raise desired to 20 at 08:00 weekdays, drop to 4 at 20:00. Proactive, not metric-driven.
4. Predictive Scaling
Uses machine learning on up to two weeks of history to forecast load up to 48 hours ahead, retrains daily, and launches capacity before demand arrives — ideal for cyclical daily/weekly traffic. Simple scaling still exists but is legacy (it stalls on a fixed cooldown); avoid it on new designs.
On the Exam: "Users see slow responses every morning during the predictable ramp-up" → scheduled or predictive scaling, because reactive policies (target tracking, step) only act after the metric rises, leaving a latency gap.
Launch Templates and Instance Refresh
| Feature | Launch Template | Launch Configuration |
|---|---|---|
| Status | Current | Legacy |
| Versioning | Yes | No |
| Mixed instance types | Yes | No |
| Spot + On-Demand mix | Yes | Limited |
To roll out a new AMI safely, create a new template version, set it as default, then start an Instance Refresh with a minimum healthy percentage (e.g., 90%). The ASG replaces instances in batches, warming up each batch before the next — zero downtime. Do not edit instances in place.
Cooldowns, Warm-Up, and Cost
The scaling cooldown (default 300 seconds, used by simple scaling) blocks further actions until the previous one settles; target tracking uses instance warm-up instead. Mixed-instances policies with Spot Instances cut cost dramatically for fault-tolerant fleets, while On-Demand base capacity protects against Spot interruption. Lifecycle hooks let you pause launch or terminate to run bootstrap or drain logic.
Termination Policy, AZ Rebalancing, and Lifecycle Hooks
When the ASG scales in, the termination policy decides which instance dies. The default tries to keep AZs balanced, then removes the instance with the oldest Launch Template/Configuration, then the one closest to the next billing hour — you can override with custom policies. Availability Zone rebalancing is why the ASG sometimes briefly launches in one AZ before terminating in another after an AZ recovers: it restores even spread for resilience, accepting a short period above desired capacity.
Lifecycle hooks pause an instance in a Pending:Wait or Terminating:Wait state so you can run setup (install software, register with a config system) before it goes into service, or drain connections and ship logs before it terminates. The hook times out (default heartbeat) and then continues, so always send a CompleteLifecycleAction when done.
Scaling on the Right Metric
CPU is the textbook target-tracking metric, but it is often wrong. For a fleet behind an ALB, ALBRequestCountPerTarget scales directly on user load and avoids the lag of CPU. For queue workers, scale on SQS ApproximateNumberOfMessagesVisible (often via a target-tracking custom metric of backlog-per-instance). For memory-bound apps, publish a custom CloudWatch metric since EC2 does not expose memory natively.
| Workload | Best scaling signal |
|---|---|
| Web fleet behind ALB | ALBRequestCountPerTarget |
| Queue consumers | SQS messages visible / backlog per instance |
| CPU-bound compute | Average CPU utilization |
| Memory-bound app | Custom memory metric via CloudWatch agent |
High Availability Through the ASG
Resilience comes from spreading the ASG across at least two, ideally three, AZs and letting Auto Scaling restore the desired count after any AZ or instance failure. Combine this with an ALB doing health checks, ELB health-check type on the ASG, and stateless instances (session and data stored in DynamoDB, ElastiCache, RDS, or S3) so any instance can be replaced freely. That combination — multi-AZ ASG + ELB + externalized state — is the canonical "highly available, self-healing web tier" answer the SAA-C03 exam rewards, and it pairs directly with the disaster-recovery and Multi-AZ topics in the rest of this domain.
A retail site has highly predictable traffic that ramps up at 9 AM and falls at 6 PM daily, and users currently experience slow responses during the morning ramp. Which approach removes the latency gap most directly?
An Auto Scaling group spans two AZs with min 2, desired 4, max 8. One AZ fails completely. What does Auto Scaling do?
A team must deploy a new AMI to an ASG without taking the service offline, replacing instances gradually. What is the correct sequence?