4.2 Performance & Caching
Key Takeaways
- Match the cache to the layer: API Gateway cache (HTTP responses per stage), CloudFront (edge static + dynamic), ElastiCache (any data source), and DAX (DynamoDB reads only).
- DAX serves eventually consistent GetItem/Query results in microseconds and is write-through; it does not accelerate strongly consistent reads or speed up writes.
- Lambda allocates CPU proportionally to memory, so raising memory can finish CPU-bound work faster and sometimes cost the same or less; tune with Lambda Power Tuning.
- Provisioned concurrency pre-warms execution environments to eliminate cold starts; initialize SDK clients and DB connections outside the handler to reuse them across warm invocations.
- DynamoDB hot partitions come from low-cardinality or skewed partition keys; fix with better key design (high cardinality, write sharding), not just more capacity.
Pick the cache by what you are accelerating
The exam loves cache-selection questions, and the trap is choosing a cache that sits at the wrong layer. Anchor on the data you are speeding up.
| Cache | Accelerates | Notes |
|---|---|---|
| API Gateway cache | HTTP responses | Enabled per stage, keyed by request parameters, TTL up to 3600 s, billed by cache size (0.5 GB-237 GB) |
| CloudFront | Static + dynamic content at the edge | 700+ global POPs, lowers origin load and latency, honors Cache-Control/TTL |
| ElastiCache | Any data source | General-purpose in-memory; Redis/Valkey for persistence, replication, pub/sub, sorted sets; Memcached for simple, multi-threaded, horizontally scaled caching |
| DAX | DynamoDB reads only | Microsecond reads, eventually consistent, write-through, near-zero code change |
DAX specifics
Amazon DynamoDB Accelerator (DAX) keeps an item cache (for GetItem/BatchGetItem) and a query cache (for Query/Scan), each with its own TTL. It returns eventually consistent data, so it does not serve strongly consistent reads — those bypass DAX and hit DynamoDB directly. Writes are write-through: DAX writes to the table first, then populates the cache, so it never accelerates writes. For read-heavy, repeat-read workloads it cuts both latency (from single-digit milliseconds to microseconds) and consumed read capacity. The code change is minimal: point the DAX client at the cluster endpoint.
Lambda performance
- Memory drives CPU. Lambda allocates vCPU proportionally to configured memory across the 128 MB-10,240 MB range; at ~1,769 MB a function gets the equivalent of one full vCPU. More memory can make a CPU-bound function finish faster, often at the same or lower total cost. Use AWS Lambda Power Tuning (a Step Functions state machine) to find the cost/speed sweet spot empirically.
- Cold starts occur on the first invocation of a new execution environment (the
initphase that loads the runtime and your imports). Provisioned concurrency pre-initializes a set number of environments so latency-sensitive paths skip cold starts; reserved concurrency caps or guarantees a function's share of the 1,000-per-region default but does not pre-warm. - Connection reuse: initialize SDK clients, HTTP keep-alive, and database connections outside the handler so warm invocations reuse them instead of reconnecting. For relational databases behind Lambda, add Amazon RDS Proxy to pool and share connections and avoid exhausting the database's connection limit during bursts.
DynamoDB performance
Every partition delivers up to 3,000 read capacity units (RCU) and 1,000 write capacity units (WCU). Exceed that on a single key and you throttle even when table-level capacity is plentiful.
- Hot partitions arise when a partition key has low cardinality or one very popular value, concentrating traffic on a few partitions. Fix with a high-cardinality partition key or write sharding (append a random or calculated suffix), not just more provisioned capacity. Adaptive capacity helps but cannot fully rescue a fundamentally skewed key.
- GSI design: a Global Secondary Index serves a new access pattern with its own partition/sort keys and its own capacity. Under-provisioning a GSI in provisioned mode throttles the base table writes, a frequent gotcha. Project only the attributes you query to keep the index lean and cheap.
- Batch and parallelism:
BatchGetItem/BatchWriteItemreduce round trips; a parallelScanwithSegment/TotalSegmentsspeeds full-table reads but burns RCU — preferQuerywhenever the access pattern allows.
Worked example: when DAX is wrong
A leaderboard reads each player's current rank thousands of times per second but updates ranks every second and requires the latest value. DAX looks tempting for the read volume, but because it serves only eventually consistent data and ranks change every second, stale reads are unacceptable — DAX is the wrong tool here. The correct answers are strongly consistent reads on a well-sharded key, or moving the hot counter to ElastiCache for Redis with application-managed freshness. The exam tests exactly this trade-off: DAX wins on repeat reads of slowly changing data, and loses the moment strong consistency is required.
Lambda cost-vs-speed math
Lambda bills on GB-seconds (allocated memory times duration). Suppose a function runs 4 s at 256 MB (1 GB-s). Double memory to 512 MB and, because CPU doubles, a CPU-bound function may finish in ~2 s — still 1 GB-s, same cost but half the latency. Push to 1,024 MB and it might run 1 s (also 1 GB-s) while feeling four times faster to the caller. This is why "increase memory" is frequently both faster and cost-neutral, and why Lambda Power Tuning charts a U-shaped cost curve you can read off directly.
Caching strategies & API Gateway throttling
Beyond picking a layer, know two cache patterns: cache-aside (the app checks the cache, loads from the source on a miss, and writes back — used with ElastiCache/Memcached) versus write-through (every write updates the cache, as DAX does). Cache-aside risks stale data and a thundering-herd on cold start; write-through keeps the cache fresh at the cost of write latency. Separately, API Gateway protects backends with throttling (a steady-state rate and a burst bucket) and usage plans keyed to API keys — these shape load but are not a substitute for a cache or for authentication.
An application reads the same small set of DynamoDB items thousands of times per second and needs microsecond latency, but the data only changes a few times per hour. Which addition requires the least code change and best fits?
A CPU-bound Lambda function configured at 256 MB runs for 4 seconds and is slow. The team wants to reduce duration and possibly cost. What should they try first?
A DynamoDB table throttles on writes during a daily batch even though provisioned write capacity is well above the aggregate request rate. The partition key is the order's status ("PENDING", "SHIPPED", "DONE"). What is the best fix?