1.5 Messaging: SQS, SNS, EventBridge

Key Takeaways

  • SQS standard queues give near-unlimited throughput with at-least-once delivery and best-effort ordering; FIFO queues guarantee exactly-once processing and ordering (300 msg/s, or up to 3,000 with batching, per message group).
  • Visibility timeout (default 30 s, max 12 hours) hides a received message until it is deleted; long polling (WaitTimeSeconds up to 20 s) reduces empty receives and API cost.
  • SNS is push-based pub/sub that fans one message out to many subscribers; the SNS-to-SQS fan-out pattern gives durable, parallel, independently consumed delivery.
  • EventBridge routes events by content-based rules across buses and integrates many AWS and SaaS sources; choose it over SNS when you need filtering, schema, and routing.
  • Step Functions Standard workflows run up to 1 year with exactly-once execution; Express workflows run up to 5 minutes for high-volume, at-least-once processing.
Last updated: June 2026

Choosing the Right Integration Service

These services connect producers and consumers without tight coupling, and most exam items are service-selection scenarios. Match the verb in the question to the service: buffer/queue to SQS, broadcast/fan-out to SNS, route/filter to EventBridge, stream/replay to Kinesis, orchestrate to Step Functions.

SQS: Standard vs FIFO

FeatureStandardFIFO
ThroughputNear-unlimited300 msg/s (up to 3,000 with batching) per message group
OrderingBest-effortStrict within a message group
DeliveryAt-least-once (possible duplicates)Exactly-once processing
De-duplicationNone5-minute de-dup window (ID or content-based)

Use FIFO when order or de-duplication matters (financial transactions, command sequences); use standard for maximum scale. FIFO queue names must end in .fifo.

Visibility Timeout, Long Polling, DLQ

When a consumer receives a message, the visibility timeout (default 30 s, max 12 hours) hides it from other consumers until it is deleted or the timer expires. Set it slightly longer than your worst-case processing time to avoid a second consumer reprocessing the same message; extend it mid-processing with ChangeMessageVisibility if needed. Long polling (WaitTimeSeconds up to 20 s) waits for messages to arrive before returning, cutting empty receives and lowering cost versus short polling.

After a message is received maxReceiveCount times without deletion, a redrive policy moves the poison message to a dead-letter queue (DLQ) for inspection.

SNS Pub/Sub and Fan-Out

Amazon SNS (Simple Notification Service) is push-based pub/sub: a message published to a topic is delivered to every subscriber — Lambda, SQS, HTTP/S, email, SMS, and mobile push. The classic fan-out pattern is SNS to multiple SQS queues: one publish, many durable queues, each consumed independently so a slow or failing consumer never blocks the others. Message filtering with subscription filter policies lets each subscriber receive only the messages it cares about. SNS also offers FIFO topics for ordered fan-out to FIFO queues.

EventBridge vs SNS

Amazon EventBridge is a serverless event bus that routes events to targets using content-based rules (match on any field via event patterns), supports a schema registry, archive and replay, scheduled events, and ingests events from many AWS services and SaaS partners. Choose EventBridge over SNS when you need rich filtering on event content, schema awareness, many AWS-service targets, or event replay. SNS remains simpler and lower-latency for high-throughput broadcast to a known set of subscribers.

Kinesis Basics

Kinesis Data Streams ingest high-volume, ordered records into shards for real-time analytics. Records are retained (default 24 h, up to 365 days) and can be replayed, and multiple consumer applications can read the same stream independently. This contrasts with SQS, where a message is typically consumed once and then deleted. A partition key routes records to shards and preserves per-shard ordering.

Step Functions: Standard vs Express

TypeMax durationExecution semanticsBillingBest for
Standard1 yearExactly-oncePer state transitionLong-running, human-in-the-loop, auditable orchestration
Express5 minutesAt-least-oncePer count + durationHigh-volume, short-lived event processing

State machines are defined in Amazon States Language (ASL) with task, choice, parallel, map, wait, and catch/retry states for built-in error handling.

SQS Sizing, Delay, and Lambda Integration

An SQS message is up to 256 KB; for larger payloads use the Extended Client Library to store the body in S3 and pass a pointer. Delay queues postpone delivery of all new messages up to 15 minutes, while message timers delay individual messages. When SQS is an event source for Lambda, Lambda polls and scales consumers automatically; the function's reserved concurrency effectively caps how fast the queue drains, and partial-batch-failure reporting lets you return only the failed message IDs so successful ones are not reprocessed.

Choosing Between the Services and Common Traps

NeedService
Decouple and buffer with retriesSQS (standard, or FIFO for order/de-dup)
Broadcast one message to many subscribersSNS
Route by event content to many AWS/SaaS targets, with replayEventBridge
High-volume ordered streaming with multiple consumers and replayKinesis Data Streams
Coordinate multi-step, possibly long-running workflowsStep Functions

Frequent traps: picking a standard queue when the scenario says "in order" or "no duplicates" (use FIFO); using a single shared queue for fan-out (one consumer steals messages — use SNS-to-SQS); choosing SNS when the scenario needs content filtering and AWS-service routing (use EventBridge); and selecting Express Step Functions for a multi-day human approval (the 5-minute cap rules it out — use Standard).

One more recurring distinction: SQS is a pull model where consumers poll and a message is processed once then deleted, whereas SNS is a push model that delivers to all subscribers and keeps no durable buffer of its own — which is exactly why the fan-out pattern pairs SNS with SQS to add durability per subscriber. For human-approval steps inside Step Functions, the .waitForTaskToken integration pattern pauses the workflow until an external system calls back with the token, the canonical way to model approvals and callbacks.

Test Your Knowledge

An order-processing system must process messages strictly in the order they were sent and must never process a duplicate. Which SQS configuration is required?

A
B
C
D
Test Your Knowledge

A single uploaded-file event must trigger three independent systems — a thumbnail generator, an audit logger, and a search indexer — each with its own durable buffer so a slow consumer never blocks the others. Which pattern fits?

A
B
C
D
Test Your Knowledge

A workflow orchestrates a multi-step approval process that can pause for days waiting on human input and must execute exactly once. Which Step Functions workflow type is appropriate?

A
B
C
D