2.2 EC2 Auto Scaling — Dynamic, Predictive, and Scheduled
Key Takeaways
- Auto Scaling groups maintain a desired number of instances and automatically replace unhealthy ones (self-healing).
- Dynamic scaling responds to real-time metrics (CPU, network, custom metrics); target tracking is the simplest and most recommended policy type.
- Predictive scaling uses machine learning to forecast traffic patterns and pre-scale capacity before demand arrives.
- Scheduled scaling adjusts capacity at predetermined times for known traffic patterns (e.g., business hours).
- Launch Templates define the EC2 configuration (AMI, instance type, security group, user data) used by Auto Scaling.
EC2 Auto Scaling — Dynamic, Predictive, and Scheduled
Quick Answer: Auto Scaling maintains the right number of EC2 instances: it adds instances when demand increases, removes them when demand drops, and replaces unhealthy instances automatically. Use target tracking for simple metric-based scaling, predictive scaling for forecasted demand, and scheduled scaling for known patterns.
Auto Scaling Group (ASG) Fundamentals
An Auto Scaling group (ASG) is a collection of EC2 instances managed as a logical unit for scaling and management.
Key Parameters
| Parameter | Description | Example |
|---|---|---|
| Minimum | Least number of instances to keep running | 2 |
| Desired | Target number of instances (Auto Scaling adjusts toward this) | 4 |
| Maximum | Most instances allowed | 10 |
| Launch Template | EC2 configuration (AMI, type, SG, user data, IAM role) | lt-0123456789 |
| VPC/Subnets | Which AZs to deploy instances in | us-east-1a, 1b, 1c |
| Health Check | EC2 (instance status) or ELB (target health) | ELB |
| Cooldown | Wait period after scaling before next action | 300 seconds |
Self-Healing
If an instance fails health checks, Auto Scaling:
- Marks the instance as unhealthy
- Terminates the unhealthy instance
- Launches a replacement instance
- Registers the new instance with the load balancer
This happens automatically with no manual intervention — it is a key resilience feature.
Scaling Policy Types
1. Target Tracking Scaling (Recommended)
The simplest and most recommended policy type. You define a target metric value, and Auto Scaling adjusts capacity to keep the metric at that target.
| Example Target | Description |
|---|---|
| CPU utilization = 50% | Add instances when CPU > 50%, remove when < 50% |
| Request count per target = 1000 | Keep ~1000 requests per instance via ALB |
| Average network in = 10 GB | Scale based on network throughput |
| Custom metric | Any CloudWatch metric you publish |
2. Step Scaling
Scales by different amounts based on the size of the alarm breach:
| CPU Range | Action |
|---|---|
| 50-70% | Add 1 instance |
| 70-90% | Add 2 instances |
| > 90% | Add 3 instances |
3. Simple Scaling
Legacy policy — waits for the cooldown period before the next scaling action. Not recommended for new implementations; use target tracking or step scaling instead.
4. Scheduled Scaling
Adjusts capacity at specific dates/times for known traffic patterns:
- Scale up to 20 instances at 8 AM every weekday
- Scale down to 4 instances at 8 PM every weekday
- Scale up for anticipated Black Friday traffic
5. Predictive Scaling
Uses machine learning to analyze historical traffic patterns and automatically provisions capacity in advance of predicted demand.
| Feature | Description |
|---|---|
| Forecast | ML model predicts traffic 48 hours in advance |
| Pre-scaling | Instances launched BEFORE demand arrives |
| Retraining | Model retrains daily with latest data |
| Best for | Cyclical patterns (daily, weekly) |
On the Exam: "The application experiences daily traffic spikes at the same time every day, and users experience slow response times during the ramp-up" → Predictive scaling or scheduled scaling (both pre-provision capacity).
Launch Templates vs. Launch Configurations
| Feature | Launch Template | Launch Configuration |
|---|---|---|
| Status | Current, recommended | Legacy, not recommended |
| Versioning | Yes (multiple versions) | No (immutable) |
| Mixed instances | Yes (multiple instance types) | No (single type) |
| Spot + On-Demand | Yes (mixed allocation) | Limited |
| T2/T3 unlimited | Configurable | Limited |
Scaling Cooldown
The cooldown period (default 300 seconds) prevents Auto Scaling from launching or terminating additional instances before the effects of previous activities take effect.
- Scale-out cooldown — Wait before launching more instances
- Scale-in cooldown — Wait before terminating more instances
Tip: If your instances take a long time to warm up, increase the cooldown period. If you use target tracking scaling, cooldown is managed automatically.
Instance Refresh
Instance refresh updates instances in an ASG without downtime:
- Set a minimum healthy percentage (e.g., 90%)
- Auto Scaling replaces instances in batches
- Each batch is launched, health-checked, and warmed up before the next batch starts
Use cases: Deploy new AMI, update launch template, apply new configuration.
A web application experiences predictable daily traffic spikes at 9 AM and traffic drops at 6 PM. Which scaling approach ensures instances are ready BEFORE the spike?
What happens when an EC2 instance in an Auto Scaling group fails its health check?
Which Auto Scaling policy type is the SIMPLEST to configure and recommended by AWS for most use cases?
An Auto Scaling group uses a Launch Template. The team needs to update the AMI for new instances while keeping existing instances running. What should they do?
An application uses an Auto Scaling group with a minimum of 2, desired of 4, and maximum of 8 instances across 2 AZs. If one AZ completely fails, what happens?