7.4 Container Orchestration — ECS, EKS, and ECR
Key Takeaways
- Amazon ECR is the managed Docker/OCI registry: KMS encryption at rest, automatic vulnerability scanning, lifecycle policies to expire old images, and cross-Region replication.
- Amazon ECS is AWS-native orchestration with no control-plane fee; Amazon EKS runs standard Kubernetes and charges $0.10 per hour (about $73/month) per cluster control plane.
- Both ECS and EKS run on the Fargate launch type (serverless, zero node management) or the EC2 launch type (you manage instances for GPU, custom kernels, or per-instance cost control).
- Fargate Spot delivers up to 70 percent savings for interruption-tolerant tasks; the binpack placement strategy packs tasks onto the fewest instances to cut EC2 cost, while spread maximizes availability.
- AWS Cloud Map provides DNS- and API-based service discovery so containers find each other by name instead of hardcoded IPs, and ECS Service Auto Scaling reacts to CPU, memory, ALB request count, or custom metrics.
ECS vs. EKS vs. ECR: The Core Map
Three services dominate container questions. Amazon ECR (Elastic Container Registry) stores images. Amazon ECS (Elastic Container Service) is AWS's own orchestrator with the simplest setup and no control-plane charge. Amazon EKS (Elastic Kubernetes Service) runs upstream Kubernetes for portability and the K8s ecosystem, but its control plane costs $0.10/hour (about $73/month) per cluster.
Both ECS and EKS choose a launch type independent of the orchestrator:
| Launch type | You manage | Pick when |
|---|---|---|
| Fargate | Nothing (serverless) | "least operational overhead", spiky/short tasks |
| EC2 | The instances | GPU, custom AMIs/kernels, Spot fleets, per-host tuning |
| Requirement | Best choice |
|---|---|
| Simplest orchestration | ECS on Fargate |
| Kubernetes compatibility / multi-cloud portability | EKS |
| Maximum container cost optimization | ECS on EC2 Spot + Fargate Spot |
| Hybrid (on-prem + cloud) | ECS Anywhere or EKS Anywhere |
| GPU machine-learning workloads | ECS/EKS on EC2 GPU instances |
Trap: "least operational overhead" steers you to Fargate, but "need GPUs" or "need a custom kernel module" forces the EC2 launch type because Fargate exposes no host.
Amazon ECR: Registry Features That Get Tested
ECR is a fully managed registry for Docker and OCI images, integrated with ECS, EKS, and Lambda container images.
| Feature | What the exam asks |
|---|---|
| Encryption at rest | Images encrypted with AWS KMS by default |
| Image scanning | Automatic vulnerability scanning (basic, or enhanced via Amazon Inspector) |
| Lifecycle policies | Auto-expire untagged or old images to control storage cost |
| Cross-Region replication | Pull images locally in multi-Region deployments |
| Cross-account access | Share images via repository policies |
| ECR Public Gallery | Host public base images |
Worked example: A CI pipeline pushes a new image on every commit, and untagged layers pile up. The correct, no-code fix is an ECR lifecycle policy such as "expire untagged images older than 14 days" and "keep only the 10 most recent tagged images." Writing a Lambda to delete images, or applying S3 lifecycle rules, is wrong because ECR exposes its own native retention rules.
Scanning trap: if a question wants images checked for CVEs automatically on push, that is ECR image scanning, not a third-party tool bolted onto the pipeline.
Scaling, Placement, and Service Discovery
ECS Service Auto Scaling
ECS scales the number of running tasks using Application Auto Scaling. It can target average CPU, average memory, ALB request count per target, or any custom CloudWatch metric. This is distinct from EC2 Auto Scaling, which scales instances; on Fargate there are no instances to scale, only tasks.
Task placement strategies (EC2 launch type)
| Strategy | Behavior | Use for |
|---|---|---|
| binpack | Fills each instance before using the next | Minimizing instance count and cost |
| spread | Distributes evenly across AZs/instances | Maximizing availability |
| random | Places arbitrarily | Simple, no preference |
Capacity providers and cost
| Provider | Cost note |
|---|---|
| Fargate | Per-second vCPU + memory, no host to manage |
| Fargate Spot | Up to 70% cheaper for interruption-tolerant tasks |
| EC2 Auto Scaling group | Pair with Spot Instances for deepest savings |
Service discovery
AWS Cloud Map registers ECS tasks under friendly names (for example, inventory.internal) and creates Route 53 records automatically, plus an API for lookups. Containers call each other by name, so scaling tasks up or down never breaks references to changing IP addresses. For richer traffic control, App Mesh adds a service mesh.
Decision recap: least ops to manage -> Fargate; cheapest containers for batch -> Fargate Spot or EC2 Spot with binpack; auto vulnerability scanning + retention -> ECR scanning + lifecycle policy; name-based microservice lookup -> Cloud Map.
Task Definitions, Networking, and Hybrid Options
An ECS task definition is the blueprint the exam keeps probing: it declares the container image (from ECR), CPU and memory, the task role (IAM permissions the application code uses) versus the task execution role (permissions ECS needs to pull the image and write logs), port mappings, and the logging driver. Confusing the two IAM roles is a frequent distractor; the task role is for your code's AWS calls, the execution role is for the ECS agent's plumbing.
Networking mode matters for Fargate: Fargate tasks always use awsvpc mode, giving each task its own elastic network interface and security group, so you secure traffic per task rather than per host. This is why "each container needs its own security group and private IP" points to awsvpc / Fargate.
| Concept | Key fact |
|---|---|
| Task role | IAM permissions for the application's AWS API calls |
| Task execution role | Lets ECS pull the ECR image and push logs to CloudWatch |
| awsvpc mode | One ENI + security group per task (required on Fargate) |
| Load balancing | ALB target group registers tasks dynamically |
Hybrid and edge container deployments
ECS Anywhere and EKS Anywhere extend the orchestration control plane to on-premises or other-cloud hardware you manage, so a single ECS/EKS interface schedules containers in your data center. Choose these when a question requires running containers on existing on-premises servers while still managing them through AWS APIs. For fully disconnected or low-latency edge sites, AWS Outposts can run ECS/EKS on AWS-managed racks inside your facility.
Worked example: A regulated workload must keep some containers on-premises for data-residency reasons but be operated with the same tooling as cloud workloads. The answer is ECS Anywhere (or EKS Anywhere), not lift-and-shift to EC2, because it preserves a single managed control plane across both environments.
Microservices in containers must locate each other by a stable service name instead of changing IP addresses as tasks scale in and out. Which AWS capability provides this?
A team wants their ECS-on-EC2 tasks packed onto as few instances as possible to minimize EC2 cost. Which task placement strategy achieves this?
A fault-tolerant batch job runs on ECS Fargate and can tolerate occasional task interruption. Which option cuts cost the most?
A CI pipeline pushes a new image to Amazon ECR on every commit, and untagged image layers are inflating storage costs. What is the lowest-effort fix?