5.3 Amazon API Gateway
Key Takeaways
- API Gateway is a fully managed front door for REST, HTTP, and WebSocket APIs that handles authorization, throttling, caching, and versioning at scale.
- REST APIs cost $3.50 per million requests and add caching, WAF, request validation, and usage plans; HTTP APIs cost $1.00 per million (~71% cheaper) but drop caching.
- Caching is REST-API-only, with TTL from 0–3600 seconds (default 300) and cache sizes from 0.5 GB up to 237 GB billed per hour while enabled.
- Default account throttling is 10,000 requests/second with a 5,000 burst per Region; usage plans tie API keys to per-client rate and quota limits.
- AWS Service integration lets API Gateway call S3, SQS, DynamoDB, or Step Functions directly, removing the need for a pass-through Lambda function.
Quick Answer: Amazon API Gateway is a fully managed service for creating and securing APIs. REST APIs ($3.50 per million requests) add caching, AWS WAF, request validation, and usage plans. HTTP APIs ($1.00 per million, roughly 71% cheaper) are leaner and faster but have no caching. WebSocket APIs support real-time two-way messaging. All three commonly front Lambda for serverless backends.
Choosing an API Type
| Type | Price | Distinctive features | Best for |
|---|---|---|---|
| REST API | $3.50 / million | Caching, WAF, usage plans, API keys, request/response validation, canary deploys | Full-featured or monetized APIs |
| HTTP API | $1.00 / million | Simple routing, JWT/OIDC authorizers, native CORS, lower latency | Lambda proxy, cost-sensitive APIs |
| WebSocket API | $1.00 / million msgs | Persistent bidirectional connections | Chat, live dashboards, notifications |
The ~71% saving comes directly from $1.00 versus $3.50 per million requests. The single most common reason to stay on REST is caching, which HTTP APIs do not offer.
Authorization, Caching, and Throttling
Authorization options:
| Method | What it does |
|---|---|
| IAM (SigV4) | Identity-based access, ideal for AWS-to-AWS and internal callers |
| Amazon Cognito | User-pool tokens validate end-user identity |
| Lambda authorizer | Custom token/request logic against any identity source |
| API keys + usage plans | Identify and meter callers — for throttling/quota, not authentication |
Caching (REST APIs only): TTL is configurable from 0 to 3600 seconds (default 300). Cache size ranges from 0.5 GB to 237 GB, billed per hour while enabled — roughly $0.02/hr at 0.5 GB up to $3.80/hr at 237 GB — so caching costs accrue even at zero traffic. Each deployment stage has its own cache, and you can invalidate per-key or flush the whole cache.
Throttling: the default account limit is 10,000 requests/second with a 5,000-request burst per Region (a token-bucket model). You can override limits at the stage, method, and per-client (usage-plan) level. Exceeding the limit returns 429 Too Many Requests.
Integration Types
| Integration | Behavior | Use case |
|---|---|---|
| Lambda proxy | Whole request passed to Lambda; Lambda returns full response | Most common serverless pattern |
| Lambda custom | Mapping templates transform request/response | Legacy or complex transforms |
| HTTP proxy | Forwards to any HTTP endpoint (ALB, EC2, external) | Existing backends |
| AWS Service | Direct call to S3, SQS, DynamoDB, Step Functions | Skip the pass-through Lambda |
| Mock | Returns a canned response with no backend | Testing, CORS preflight |
On the Exam: "Serverless REST API, least operational overhead" → API Gateway + Lambda proxy. "Cache responses for 5 minutes" → REST API (HTTP APIs cannot cache). "Throttle and meter each customer" → usage plans + API keys. "Drop a message on SQS with no Lambda" → AWS Service integration.
Edge Optimization, Stages, and Security
API Gateway REST APIs offer three endpoint types that decide where requests terminate, and the exam tests matching them to a scenario:
| Endpoint type | Where it lives | Use when |
|---|---|---|
| Edge-optimized | Routed through CloudFront edge locations | Global clients, latency-sensitive |
| Regional | Served from the API's Region | Same-Region clients, or your own CloudFront in front |
| Private | Reachable only via an interface VPC endpoint | Internal-only APIs, no internet exposure |
Stages (such as dev, prod) are independent deployments with their own throttling, caching, logging, and stage variables. Canary deployments on a stage shift a configurable percentage of traffic to a new version, enabling safe rollouts and quick rollback.
For protection, attach AWS WAF to a REST API to filter SQL injection, cross-site scripting, and rate-based floods at the application layer — WAF integrates with REST APIs but not HTTP APIs, another reason to choose REST for public, security-sensitive endpoints. Combine WAF with usage-plan throttling to defend against abusive callers.
Common Traps
| Trap | Reality |
|---|---|
| "HTTP API caches like REST" | Only REST APIs support response caching |
| "API keys authenticate users" | API keys identify/meter callers; use Cognito or a Lambda authorizer for auth |
| "WAF protects HTTP APIs" | WAF integrates with REST APIs (and CloudFront), not HTTP APIs |
| "Caching is free when traffic is zero" | Cache is billed per hour while enabled regardless of traffic |
| "429 means the backend is down" | 429 Too Many Requests is API Gateway throttling, not a backend failure |
Finally, weigh cost realistically: at high request volumes the $2.50-per-million difference between REST ($3.50) and HTTP ($1.00) APIs is large, so default to HTTP APIs for simple Lambda-proxy routing and reserve REST APIs for when you genuinely need caching, WAF, usage plans, or request validation.
Throttling Math and Resilience
Throttling uses a token-bucket model defined by two numbers: a steady-state rate (requests per second) and a burst (the bucket depth). With the default 10,000 requests/second and a 5,000 burst, a sudden spike can briefly serve up to the burst capacity before the steady rate caps throughput; excess requests receive 429 responses that well-behaved clients retry with exponential backoff. Because the account limit is shared by every API in the Region, a single misbehaving API can starve the others — which is precisely why per-method and usage-plan limits exist to fence off capacity.
For durability under spikes, a common pattern is API Gateway → SQS (via AWS Service integration) so bursts are buffered in a queue and consumers drain them at a safe pace, decoupling the front door from backend capacity and smoothing traffic without dropping requests.
A team wants a serverless REST API that invokes Lambda and caches responses for 5 minutes to cut backend load. Which API Gateway type must they use?
An architect wants API Gateway to place incoming messages directly onto an Amazon SQS queue without running a Lambda function. Which integration type achieves this?
A public API is returning HTTP 429 responses under heavy load. What is the most likely cause and the right per-customer fix?