2.2 EC2 Auto Scaling — Dynamic, Predictive, and Scheduled

Key Takeaways

  • Auto Scaling groups maintain a desired number of instances and automatically replace unhealthy ones (self-healing).
  • Dynamic scaling responds to real-time metrics (CPU, network, custom metrics); target tracking is the simplest and most recommended policy type.
  • Predictive scaling uses machine learning to forecast traffic patterns and pre-scale capacity before demand arrives.
  • Scheduled scaling adjusts capacity at predetermined times for known traffic patterns (e.g., business hours).
  • Launch Templates define the EC2 configuration (AMI, instance type, security group, user data) used by Auto Scaling.
Last updated: March 2026

EC2 Auto Scaling — Dynamic, Predictive, and Scheduled

Quick Answer: Auto Scaling maintains the right number of EC2 instances: it adds instances when demand increases, removes them when demand drops, and replaces unhealthy instances automatically. Use target tracking for simple metric-based scaling, predictive scaling for forecasted demand, and scheduled scaling for known patterns.

Auto Scaling Group (ASG) Fundamentals

An Auto Scaling group (ASG) is a collection of EC2 instances managed as a logical unit for scaling and management.

Key Parameters

ParameterDescriptionExample
MinimumLeast number of instances to keep running2
DesiredTarget number of instances (Auto Scaling adjusts toward this)4
MaximumMost instances allowed10
Launch TemplateEC2 configuration (AMI, type, SG, user data, IAM role)lt-0123456789
VPC/SubnetsWhich AZs to deploy instances inus-east-1a, 1b, 1c
Health CheckEC2 (instance status) or ELB (target health)ELB
CooldownWait period after scaling before next action300 seconds

Self-Healing

If an instance fails health checks, Auto Scaling:

  1. Marks the instance as unhealthy
  2. Terminates the unhealthy instance
  3. Launches a replacement instance
  4. Registers the new instance with the load balancer

This happens automatically with no manual intervention — it is a key resilience feature.

Scaling Policy Types

1. Target Tracking Scaling (Recommended)

The simplest and most recommended policy type. You define a target metric value, and Auto Scaling adjusts capacity to keep the metric at that target.

Example TargetDescription
CPU utilization = 50%Add instances when CPU > 50%, remove when < 50%
Request count per target = 1000Keep ~1000 requests per instance via ALB
Average network in = 10 GBScale based on network throughput
Custom metricAny CloudWatch metric you publish

2. Step Scaling

Scales by different amounts based on the size of the alarm breach:

CPU RangeAction
50-70%Add 1 instance
70-90%Add 2 instances
> 90%Add 3 instances

3. Simple Scaling

Legacy policy — waits for the cooldown period before the next scaling action. Not recommended for new implementations; use target tracking or step scaling instead.

4. Scheduled Scaling

Adjusts capacity at specific dates/times for known traffic patterns:

  • Scale up to 20 instances at 8 AM every weekday
  • Scale down to 4 instances at 8 PM every weekday
  • Scale up for anticipated Black Friday traffic

5. Predictive Scaling

Uses machine learning to analyze historical traffic patterns and automatically provisions capacity in advance of predicted demand.

FeatureDescription
ForecastML model predicts traffic 48 hours in advance
Pre-scalingInstances launched BEFORE demand arrives
RetrainingModel retrains daily with latest data
Best forCyclical patterns (daily, weekly)

On the Exam: "The application experiences daily traffic spikes at the same time every day, and users experience slow response times during the ramp-up" → Predictive scaling or scheduled scaling (both pre-provision capacity).

Launch Templates vs. Launch Configurations

FeatureLaunch TemplateLaunch Configuration
StatusCurrent, recommendedLegacy, not recommended
VersioningYes (multiple versions)No (immutable)
Mixed instancesYes (multiple instance types)No (single type)
Spot + On-DemandYes (mixed allocation)Limited
T2/T3 unlimitedConfigurableLimited

Scaling Cooldown

The cooldown period (default 300 seconds) prevents Auto Scaling from launching or terminating additional instances before the effects of previous activities take effect.

  • Scale-out cooldown — Wait before launching more instances
  • Scale-in cooldown — Wait before terminating more instances

Tip: If your instances take a long time to warm up, increase the cooldown period. If you use target tracking scaling, cooldown is managed automatically.

Instance Refresh

Instance refresh updates instances in an ASG without downtime:

  1. Set a minimum healthy percentage (e.g., 90%)
  2. Auto Scaling replaces instances in batches
  3. Each batch is launched, health-checked, and warmed up before the next batch starts

Use cases: Deploy new AMI, update launch template, apply new configuration.

Test Your Knowledge

A web application experiences predictable daily traffic spikes at 9 AM and traffic drops at 6 PM. Which scaling approach ensures instances are ready BEFORE the spike?

A
B
C
D
Test Your Knowledge

What happens when an EC2 instance in an Auto Scaling group fails its health check?

A
B
C
D
Test Your Knowledge

Which Auto Scaling policy type is the SIMPLEST to configure and recommended by AWS for most use cases?

A
B
C
D
Test Your Knowledge

An Auto Scaling group uses a Launch Template. The team needs to update the AMI for new instances while keeping existing instances running. What should they do?

A
B
C
D
Test Your Knowledge

An application uses an Auto Scaling group with a minimum of 2, desired of 4, and maximum of 8 instances across 2 AZs. If one AZ completely fails, what happens?

A
B
C
D