7.4 Container Orchestration — ECS, EKS, and ECR
Key Takeaways
- Amazon ECR (Elastic Container Registry) is a managed Docker registry for storing, managing, and deploying container images with lifecycle policies.
- ECS Service Auto Scaling adjusts the number of tasks based on metrics (CPU, memory, ALB request count, custom CloudWatch metrics).
- ECS Anywhere and EKS Anywhere let you run containers on your own on-premises infrastructure while managing them through AWS.
- ECS supports service discovery via AWS Cloud Map and service mesh via AWS App Mesh for microservices communication.
- Use ECS for simpler container orchestration; EKS when you need Kubernetes compatibility; Fargate when you want zero server management.
Container Orchestration — ECS, EKS, and ECR
Quick Answer: ECR stores container images. ECS orchestrates Docker containers (AWS-native). EKS runs Kubernetes (standard K8s). Both support Fargate (serverless) or EC2 (self-managed nodes). Use ECS for simplicity, EKS for Kubernetes compatibility, and ECR for image management.
Amazon ECR (Elastic Container Registry)
ECR is a fully managed Docker container registry.
| Feature | Detail |
|---|---|
| Storage | Stores Docker and OCI images |
| Encryption | Images encrypted at rest with KMS |
| Scanning | Automatic vulnerability scanning (integrated with Inspector) |
| Lifecycle policies | Automatically clean up old/untagged images |
| Cross-Region | Replication for multi-Region deployments |
| Cross-account | Share images across accounts via repository policies |
| Public registry | ECR Public Gallery for public images |
ECS Advanced Features
ECS Service Auto Scaling
| Metric | Scaling Trigger |
|---|---|
| ECS Service CPU | Scale when average CPU exceeds target |
| ECS Service Memory | Scale when average memory exceeds target |
| ALB Request Count | Scale based on requests per target |
| Custom CloudWatch | Scale on any custom metric |
ECS Service Discovery (AWS Cloud Map)
| Feature | Detail |
|---|---|
| Purpose | Services find each other by name instead of IP |
| DNS-based | Creates Route 53 records automatically |
| API-based | Services can query Cloud Map API |
| Health checks | Automatic health checking of registered services |
ECS Task Placement Strategies
| Strategy | Description |
|---|---|
| binpack | Place tasks on fewest instances (cost optimization) |
| spread | Distribute across AZs or instances (availability) |
| random | Random placement |
ECS Capacity Providers
| Provider | Description |
|---|---|
| Fargate | Serverless — AWS manages instances |
| Fargate Spot | Spot pricing for Fargate tasks (up to 70% savings) |
| EC2 Auto Scaling Group | Your managed EC2 instances |
EKS Advanced Features
EKS Node Types
| Type | Description |
|---|---|
| Managed Node Groups | AWS manages EC2 instances for worker nodes |
| Self-managed Nodes | You manage EC2 instances (maximum control) |
| Fargate | Serverless — no nodes to manage |
EKS Pricing
| Component | Cost |
|---|---|
| Control plane | $0.10/hour ($73/month) |
| Worker nodes | EC2 or Fargate pricing |
| Fargate | Per vCPU and memory per second |
Decision Matrix
| Requirement | Best Choice |
|---|---|
| Simplest container orchestration | ECS on Fargate |
| Kubernetes compatibility required | EKS |
| Maximum cost optimization for containers | ECS on EC2 with Spot + Fargate Spot |
| Hybrid containers (on-premises + cloud) | ECS Anywhere or EKS Anywhere |
| Short-lived batch containers | ECS/EKS on Fargate |
| GPU workloads | ECS/EKS on EC2 (GPU instances) |
On the Exam: "Run containers with the least operational overhead" → ECS/EKS on Fargate. "Container image management with vulnerability scanning" → ECR with image scanning. "Cost-optimize Fargate containers" → Fargate Spot capacity provider.
A company needs microservices running in containers to discover each other by service name without hardcoding IP addresses. Which AWS service enables this?
Which ECS task placement strategy minimizes the number of EC2 instances used (and therefore cost)?
A company wants to reduce the cost of their ECS Fargate tasks for a batch processing workload that can tolerate interruptions. Which option provides the highest cost savings?
A company stores container images in Amazon ECR. They notice old, untagged images are consuming significant storage. How should they automate cleanup?