6.3 CloudWatch, CloudTrail, and AWS Config

Key Takeaways

  • Amazon CloudWatch monitors performance metrics, collects logs, sets alarms, and triggers automated actions — it answers 'how are my resources performing?'
  • AWS CloudTrail records management and data API calls — it is the audit trail answering 'who did what, when, and from where?'
  • AWS Config records resource configuration over time and evaluates rules — it answers 'is this resource configured correctly and compliant?'
  • EC2 memory and disk-space utilization are NOT default CloudWatch metrics; you must install the CloudWatch agent to publish them.
  • CloudWatch metric statistics are retained on a tiered schedule (high-resolution data ages out faster), and alarm actions include SNS, Auto Scaling, EC2 stop/terminate/reboot/recover, and Lambda.
Last updated: June 2026

Amazon CloudWatch — Performance and Observability

Quick Answer: CloudWatch = metrics + logs + alarms ("how is it performing?"). CloudTrail = API call history ("who did what?"). AWS Config = configuration history + compliance rules ("is it configured correctly?"). The exam loves one-line scenarios that hinge on exactly which of these three answers the question.

CloudWatch ingests metrics (time-series numbers), stores logs, and fires alarms that drive automation. Knowing which metrics exist by default is a recurring exam point.

ComponentWhat it does
MetricsCPUUtilization, NetworkIn/Out, plus custom metrics published via the API
LogsCentralize application and service logs into log groups/streams
AlarmsWatch a metric/expression and act when a threshold is breached
DashboardsVisualize metrics and logs across Regions and accounts
Logs InsightsPurpose-built query language over log data
SyntheticsCanaries that probe URLs/APIs on a schedule
Contributor InsightsIdentify top-N contributors to a spike

Default vs. agent-required metrics

ServiceDefault metricsRequires CloudWatch agent
EC2CPUUtilization, NetworkIn/Out, DiskReadOps, StatusCheckMemory utilization, disk free space, swap
RDSCPU, FreeableMemory, DatabaseConnections, IOPS(comprehensive by default)
LambdaInvocations, Duration, Errors, ThrottlesCustom business metrics via SDK
ALBRequestCount, TargetResponseTime, HTTPCode_Target_5XX(comprehensive by default)

Critical trap: A scenario says "alarm when EC2 memory exceeds 90%." Memory is NOT a default hypervisor-visible metric — install the CloudWatch agent on the instance to publish it, then alarm on the custom metric.

Alarms and automated response

An alarm has three states: OK (within threshold), ALARM (breached), and INSUFFICIENT_DATA (not enough samples). Alarm actions include sending an SNS notification, triggering Auto Scaling, performing an EC2 action (stop, terminate, reboot, recover — recover rebuilds on healthy hardware preserving the instance ID and private IP), or invoking Lambda. Metric filters turn log patterns (such as counting ERROR lines) into metrics you can alarm on, and subscription filters stream logs in real time to Lambda, Kinesis, or OpenSearch.

CloudTrail and AWS Config — Audit and Compliance

AWS CloudTrail (the audit log)

CloudTrail records each API call with the identity (IAM principal), action, timestamp, source IP, and request/response details. By default, management events are viewable for 90 days in Event history at no cost; to retain longer or analyze, create a trail that delivers logs to S3.

Event typeExamplesNotes
Management eventsRunInstances, DeleteBucket, console sign-in90-day history free; trail for long-term
Data eventsS3 GetObject/PutObject, Lambda InvokeHigh volume; charged; off by default
Insights eventsUnusual call-rate anomaliesCharged; flags spikes like mass deletes

A single multi-Region trail captures every Region, and an organization trail captures all accounts in AWS Organizations. Log file integrity validation uses hashing to prove logs were not altered — vital for forensic and audit defensibility.

AWS Config (the configuration recorder)

Config continuously records resource configuration items and builds a per-resource timeline you can rewind. Config rules evaluate desired state and mark resources compliant or not; remediation can auto-fix via SSM Automation; an aggregator rolls up many accounts/Regions.

RuleWhat it enforces
encrypted-volumesAll EBS volumes are encrypted
s3-bucket-versioning-enabledBuckets have versioning on
rds-instance-public-access-checkRDS is not publicly reachable
iam-root-access-key-checkRoot has no access keys

The three pillars side by side

DimensionCloudWatchCloudTrailAWS Config
QuestionHow is it performing?Who did what?Is it configured correctly?
DataMetrics, logs, alarmsAPI call recordsConfig snapshots + history
Typical useAlarm on CPU/memoryInvestigate an incidentEnforce/auto-remediate compliance

On the Exam: "Who terminated the instance at 2 a.m.?" → CloudTrail. "Alert when CPU > 80%" → CloudWatch. "Continuously verify every bucket has versioning and fix violations" → AWS Config rule + SSM remediation.

How the three pillars work together

The services are complementary, and strong architectures use all three. Imagine an unexpected production outage: CloudWatch alarms first detect the symptom (latency spiked, 5XX errors climbed), CloudTrail then reveals the cause (an engineer modified a security group at 1:58 a.m.), and AWS Config shows the exact before-and-after configuration on its resource timeline and flags that the change violated a compliance rule.

A common multi-account design centralizes all three: an organization CloudTrail writes to a dedicated logging account's S3 bucket, a CloudWatch cross-account dashboard aggregates metrics, and a Config aggregator rolls up compliance — giving security teams one authoritative view.

EventBridge and automated remediation

CloudWatch Events evolved into Amazon EventBridge, which reacts to events (including CloudTrail-recorded API calls and Config rule changes) and routes them to targets like Lambda, SNS, or Step Functions. This is the glue for event-driven remediation: when Config flags a public S3 bucket, an EventBridge rule can invoke a Lambda that blocks public access automatically. Distinguish the roles on the exam — CloudWatch alarms watch numeric metric thresholds, while EventBridge rules match event patterns and content.

A "when an unauthorized API call happens, trigger a workflow" requirement is EventBridge reacting to CloudTrail, not a CloudWatch alarm. Remember also that CloudWatch Logs retention defaults to never expire until you set it, which silently grows storage cost — a frequently tested cost-optimization detail.

Test Your Knowledge

A security team must determine which IAM principal deleted an S3 bucket last Tuesday and from what source IP. Which service provides this?

A
B
C
D
Test Your Knowledge

An operations team needs an alarm when EC2 memory utilization exceeds 90%. What must they do first?

A
B
C
D
Test Your Knowledge

A company must continuously verify that all EBS volumes are encrypted and automatically remediate any that are not. Which approach fits best?

A
B
C
D