5.3 Amazon API Gateway

Key Takeaways

  • API Gateway is a fully managed front door for REST, HTTP, and WebSocket APIs that handles authorization, throttling, caching, and versioning at scale.
  • REST APIs cost $3.50 per million requests and add caching, WAF, request validation, and usage plans; HTTP APIs cost $1.00 per million (~71% cheaper) but drop caching.
  • Caching is REST-API-only, with TTL from 0–3600 seconds (default 300) and cache sizes from 0.5 GB up to 237 GB billed per hour while enabled.
  • Default account throttling is 10,000 requests/second with a 5,000 burst per Region; usage plans tie API keys to per-client rate and quota limits.
  • AWS Service integration lets API Gateway call S3, SQS, DynamoDB, or Step Functions directly, removing the need for a pass-through Lambda function.
Last updated: June 2026

Quick Answer: Amazon API Gateway is a fully managed service for creating and securing APIs. REST APIs ($3.50 per million requests) add caching, AWS WAF, request validation, and usage plans. HTTP APIs ($1.00 per million, roughly 71% cheaper) are leaner and faster but have no caching. WebSocket APIs support real-time two-way messaging. All three commonly front Lambda for serverless backends.

Choosing an API Type

TypePriceDistinctive featuresBest for
REST API$3.50 / millionCaching, WAF, usage plans, API keys, request/response validation, canary deploysFull-featured or monetized APIs
HTTP API$1.00 / millionSimple routing, JWT/OIDC authorizers, native CORS, lower latencyLambda proxy, cost-sensitive APIs
WebSocket API$1.00 / million msgsPersistent bidirectional connectionsChat, live dashboards, notifications

The ~71% saving comes directly from $1.00 versus $3.50 per million requests. The single most common reason to stay on REST is caching, which HTTP APIs do not offer.

Authorization, Caching, and Throttling

Authorization options:

MethodWhat it does
IAM (SigV4)Identity-based access, ideal for AWS-to-AWS and internal callers
Amazon CognitoUser-pool tokens validate end-user identity
Lambda authorizerCustom token/request logic against any identity source
API keys + usage plansIdentify and meter callers — for throttling/quota, not authentication

Caching (REST APIs only): TTL is configurable from 0 to 3600 seconds (default 300). Cache size ranges from 0.5 GB to 237 GB, billed per hour while enabled — roughly $0.02/hr at 0.5 GB up to $3.80/hr at 237 GB — so caching costs accrue even at zero traffic. Each deployment stage has its own cache, and you can invalidate per-key or flush the whole cache.

Throttling: the default account limit is 10,000 requests/second with a 5,000-request burst per Region (a token-bucket model). You can override limits at the stage, method, and per-client (usage-plan) level. Exceeding the limit returns 429 Too Many Requests.

Integration Types

IntegrationBehaviorUse case
Lambda proxyWhole request passed to Lambda; Lambda returns full responseMost common serverless pattern
Lambda customMapping templates transform request/responseLegacy or complex transforms
HTTP proxyForwards to any HTTP endpoint (ALB, EC2, external)Existing backends
AWS ServiceDirect call to S3, SQS, DynamoDB, Step FunctionsSkip the pass-through Lambda
MockReturns a canned response with no backendTesting, CORS preflight

On the Exam: "Serverless REST API, least operational overhead" → API Gateway + Lambda proxy. "Cache responses for 5 minutes" → REST API (HTTP APIs cannot cache). "Throttle and meter each customer" → usage plans + API keys. "Drop a message on SQS with no Lambda" → AWS Service integration.

Edge Optimization, Stages, and Security

API Gateway REST APIs offer three endpoint types that decide where requests terminate, and the exam tests matching them to a scenario:

Endpoint typeWhere it livesUse when
Edge-optimizedRouted through CloudFront edge locationsGlobal clients, latency-sensitive
RegionalServed from the API's RegionSame-Region clients, or your own CloudFront in front
PrivateReachable only via an interface VPC endpointInternal-only APIs, no internet exposure

Stages (such as dev, prod) are independent deployments with their own throttling, caching, logging, and stage variables. Canary deployments on a stage shift a configurable percentage of traffic to a new version, enabling safe rollouts and quick rollback.

For protection, attach AWS WAF to a REST API to filter SQL injection, cross-site scripting, and rate-based floods at the application layer — WAF integrates with REST APIs but not HTTP APIs, another reason to choose REST for public, security-sensitive endpoints. Combine WAF with usage-plan throttling to defend against abusive callers.

Common Traps

TrapReality
"HTTP API caches like REST"Only REST APIs support response caching
"API keys authenticate users"API keys identify/meter callers; use Cognito or a Lambda authorizer for auth
"WAF protects HTTP APIs"WAF integrates with REST APIs (and CloudFront), not HTTP APIs
"Caching is free when traffic is zero"Cache is billed per hour while enabled regardless of traffic
"429 means the backend is down"429 Too Many Requests is API Gateway throttling, not a backend failure

Finally, weigh cost realistically: at high request volumes the $2.50-per-million difference between REST ($3.50) and HTTP ($1.00) APIs is large, so default to HTTP APIs for simple Lambda-proxy routing and reserve REST APIs for when you genuinely need caching, WAF, usage plans, or request validation.

Throttling Math and Resilience

Throttling uses a token-bucket model defined by two numbers: a steady-state rate (requests per second) and a burst (the bucket depth). With the default 10,000 requests/second and a 5,000 burst, a sudden spike can briefly serve up to the burst capacity before the steady rate caps throughput; excess requests receive 429 responses that well-behaved clients retry with exponential backoff. Because the account limit is shared by every API in the Region, a single misbehaving API can starve the others — which is precisely why per-method and usage-plan limits exist to fence off capacity.

For durability under spikes, a common pattern is API Gateway → SQS (via AWS Service integration) so bursts are buffered in a queue and consumers drain them at a safe pace, decoupling the front door from backend capacity and smoothing traffic without dropping requests.

Test Your Knowledge

A team wants a serverless REST API that invokes Lambda and caches responses for 5 minutes to cut backend load. Which API Gateway type must they use?

A
B
C
D
Test Your Knowledge

An architect wants API Gateway to place incoming messages directly onto an Amazon SQS queue without running a Lambda function. Which integration type achieves this?

A
B
C
D
Test Your Knowledge

A public API is returning HTTP 429 responses under heavy load. What is the most likely cause and the right per-customer fix?

A
B
C
D