6.3 CloudWatch, CloudTrail, and AWS Config
Key Takeaways
- CloudWatch monitors performance metrics, collects logs, sets alarms, and triggers automated actions — it is the primary monitoring service for AWS resources.
- CloudTrail records all API calls made in your AWS account — it is the audit trail for WHO did WHAT and WHEN.
- AWS Config records configuration changes over time and evaluates resource configurations against desired rules for compliance.
- CloudWatch = performance monitoring (metrics, logs, alarms). CloudTrail = API audit (who/what/when). Config = configuration compliance (desired state vs. actual state).
- CloudWatch Alarms can trigger Auto Scaling, SNS notifications, EC2 actions (stop, reboot, terminate), and Lambda functions.
CloudWatch, CloudTrail, and AWS Config
Quick Answer: CloudWatch = metrics + logs + alarms (how are resources performing?). CloudTrail = API call logging (who did what?). AWS Config = configuration history + compliance rules (is this configured correctly?). Use all three together for complete monitoring and governance.
Amazon CloudWatch
CloudWatch is the primary monitoring and observability service for AWS resources.
CloudWatch Components
| Component | Description |
|---|---|
| Metrics | Time-series data points (CPU utilization, network bytes, custom metrics) |
| Logs | Collect, store, and analyze log files from AWS services and applications |
| Alarms | Watch a metric and trigger actions when threshold is breached |
| Dashboards | Visual display of metrics and logs |
| Events/EventBridge | Respond to state changes (now part of EventBridge) |
| Insights | Container Insights, Lambda Insights, Application Insights |
| Synthetics | Canary scripts to monitor API endpoints and URLs |
| Contributor Insights | Identify top-N contributors to metric changes |
Key Metrics
| Service | Default Metrics | Custom Metrics (require agent) |
|---|---|---|
| EC2 | CPU, Network, Disk I/O, Status Checks | Memory utilization, disk space |
| RDS | CPU, memory, connections, IOPS, storage | N/A (AWS provides comprehensive metrics) |
| Lambda | Invocations, duration, errors, throttles | Custom (published via SDK) |
| ALB | Request count, latency, 4xx/5xx errors | N/A |
Important: EC2 memory utilization is NOT a default CloudWatch metric. You must install the CloudWatch Agent to collect memory and disk-level metrics.
CloudWatch Alarms
| State | Meaning |
|---|---|
| OK | Metric is within the defined threshold |
| ALARM | Metric has exceeded the threshold |
| INSUFFICIENT_DATA | Not enough data to determine state |
Alarm Actions:
- Send SNS notification (email, SMS, Lambda)
- Auto Scaling action (add/remove instances)
- EC2 action (stop, terminate, reboot, recover)
CloudWatch Logs
| Feature | Description |
|---|---|
| Log Groups | Collection of log streams with shared settings |
| Log Streams | Sequence of events from the same source |
| Retention | 1 day to 10 years, or never expire |
| Metric Filters | Extract metrics from log patterns (e.g., count ERROR occurrences) |
| Subscription Filters | Stream logs to Lambda, Kinesis, or OpenSearch in real time |
| Log Insights | Query logs with SQL-like syntax |
| Cross-account | Centralize logs from multiple accounts |
AWS CloudTrail
CloudTrail records API calls and actions made in your AWS account.
| Feature | Description |
|---|---|
| What it logs | Every API call: who, what, when, where, how |
| Default | Enabled by default (90 days of management events) |
| Trail | Configure to deliver logs to S3 for long-term storage |
| Event types | Management events (free), Data events (charged), Insights events |
| Multi-Region | A trail can capture events from all Regions |
| Organization trail | Single trail for all accounts in an organization |
| Integrity | Log file validation detects tampering |
CloudTrail Event Types
| Type | Examples | Cost |
|---|---|---|
| Management Events | Create/delete/modify resources, console sign-in | Free (90-day history) |
| Data Events | S3 object-level operations, Lambda invocations | Charged per event |
| Insights Events | Unusual API activity patterns | Charged |
AWS Config
AWS Config records configuration changes and evaluates compliance.
| Feature | Description |
|---|---|
| Configuration recording | Records changes to resource configurations over time |
| Config Rules | Evaluate whether configurations comply with desired settings |
| Compliance dashboard | Visual status of compliant vs. non-compliant resources |
| Remediation | Automatically fix non-compliant resources via SSM Automation |
| Aggregator | Multi-account, multi-Region compliance view |
| Timeline | View configuration history for any resource |
Config Rule Examples
| Rule | What It Checks |
|---|---|
| s3-bucket-versioning-enabled | S3 buckets have versioning enabled |
| ec2-instance-no-public-ip | EC2 instances do not have public IPs |
| rds-instance-public-access-check | RDS instances are not publicly accessible |
| encrypted-volumes | EBS volumes are encrypted |
| iam-root-access-key-check | Root account has no access keys |
Three Pillars Comparison
| Feature | CloudWatch | CloudTrail | AWS Config |
|---|---|---|---|
| Focus | Performance metrics and logs | API call audit trail | Configuration compliance |
| Question answered | "How is it performing?" | "Who did what?" | "Is it configured correctly?" |
| Data | Metrics, logs, alarms | API call records | Configuration snapshots/changes |
| Use case | Monitor CPU, set alarms | Investigate security incidents | Enforce compliance rules |
On the Exam: "Who terminated the EC2 instance last night?" → CloudTrail. "Is EC2 CPU above 80%?" → CloudWatch. "Are all S3 buckets encrypted?" → AWS Config.
A security team needs to investigate who deleted an S3 bucket last Tuesday. Which service provides this information?
A company needs to ensure all EBS volumes are encrypted and automatically remediate non-compliant volumes. Which service should they use?
An EC2 instance is experiencing high CPU utilization. Which service provides this metric by default?