7.2 Advanced SQS Patterns — DLQ, Backpressure, and FIFO

Key Takeaways

  • A Dead-Letter Queue (DLQ) captures messages that fail after maxReceiveCount delivery attempts, isolating poison-pill messages so one bad record cannot block the whole queue.
  • Standard queues give at-least-once delivery and best-effort ordering at nearly unlimited throughput; FIFO queues give exactly-once processing and strict ordering at 300 (or 3,000 batched) messages per second.
  • FIFO ordering is scoped to the Message Group ID: messages sharing a group are strictly serial, while different groups run in parallel, so the group ID is your parallelism key.
  • Long polling (WaitTimeSeconds up to 20 seconds) eliminates empty ReceiveMessage responses, cutting API request charges versus short polling.
  • Backpressure uses queue depth (ApproximateNumberOfMessagesVisible) to drive Auto Scaling of consumers, letting SQS absorb bursts so downstream systems are never overwhelmed.
Last updated: June 2026

Standard vs. FIFO: The First Decision

Every Amazon SQS question starts here. Standard queues deliver at-least-once (occasional duplicates) with best-effort ordering and effectively unlimited throughput. FIFO queues guarantee exactly-once processing and strict first-in-first-out ordering, but are capped at 300 messages/second per API action (send, receive, delete), or 3,000 messages/second with batching of 10 messages per call. High-throughput mode lifts this further per Region.

FeatureStandardFIFO
OrderingBest-effortStrict, per Message Group ID
DeliveryAt-least-once (duplicates possible)Exactly-once
ThroughputNearly unlimited300/s, or 3,000/s batched
Name suffixnonemust end in .fifo

Exam trap: "financial transactions, no duplicates, in order per account" is FIFO. "Decouple a high-volume image pipeline, occasional reprocessing is fine" is Standard. Do not choose FIFO just because ordering sounds nice; its throughput cap can disqualify it.

Dead-Letter Queues and Redrive

A Dead-Letter Queue is an ordinary SQS queue you attach to a source queue via a redrive policy. The flow:

  1. A message is received; the consumer fails to process it.
  2. After the visibility timeout expires, the message becomes visible again.
  3. Once the receive count hits maxReceiveCount (for example, 3), SQS moves the message to the DLQ instead of redelivering forever.
  4. You alarm on DLQ depth, fix the bug, then use redrive to move messages back to the source queue.
SettingGuidance
maxReceiveCountTypically 3–5; too low loses transient-failure retries
DLQ retentionSet longer than the source (up to 14 days) so you have time to investigate
Queue-type matchStandard source needs a Standard DLQ; FIFO source needs a FIFO DLQ
AlarmCloudWatch alarm on ApproximateNumberOfMessagesVisible of the DLQ

Worked example: A consumer crashes on a malformed record. Without a DLQ that one message is redelivered endlessly, blocking healthy traffic behind it (a "poison pill"). With maxReceiveCount of 3 and a DLQ, the bad record is sidelined after three tries and processing continues.

FIFO Internals: Group ID, Deduplication, Visibility

Message Group ID is the parallelism control. All messages sharing a group are processed strictly in order; messages in different groups process concurrently. Using a customer ID or device ID as the group ID gives per-entity ordering with cross-entity parallelism.

GoalGroup ID strategyResult
Total global orderOne shared group IDFully serial (slow)
Order per customerCustomer IDSerial per customer, parallel across customers
Order per deviceDevice IDSerial per device, parallel across devices

Deduplication prevents duplicate sends within a 5-minute window. Use content-based dedup (SQS hashes the body) when identical bodies mean duplicates, or supply an explicit MessageDeduplicationId when you control idempotency keys.

Visibility timeout is the silent culprit behind duplicate processing on Standard queues: if processing takes longer than the timeout, the message reappears and a second consumer grabs it. Tune the timeout above your worst-case processing time, or call ChangeMessageVisibility to extend it mid-flight.

Long Polling and Backpressure

Polling typeBehaviorCost
Short pollingReturns immediately, often emptyHigher API charges
Long polling (WaitTimeSeconds 1–20)Waits up to 20 s for a messageLower, fewer empty calls

Backpressure decouples producer rate from consumer rate: producers push freely, SQS buffers the surge, and an Auto Scaling policy on the ApproximateNumberOfMessagesVisible metric adds consumers as the backlog grows and removes them as it drains, so a downstream database is never flooded.

Delivery Delays, Retention, and Choosing SNS, SQS, or EventBridge

Three timing knobs frequently appear in answer choices, and confusing them is a classic distractor.

SettingRangeWhat it does
Visibility timeout0 s to 12 hours (default 30 s)Hides a received message from other consumers while it is processed
Message retention1 minute to 14 days (default 4 days)How long an unprocessed message survives in the queue
Delivery delay0 to 15 minutesPostpones first delivery of a new message
WaitTimeSeconds0 to 20 secondsLong-poll wait per receive call

A delay queue (delivery delay) is the right answer for "hold every message 5 minutes before processing," whereas per-message timers use the message timer attribute. Do not confuse delay with visibility timeout: delay applies before first delivery; visibility timeout applies after a message has been received.

Service-selection trap: SQS, SNS, and EventBridge all "decouple," so read the verb. "Buffer and process at the consumer's pace, possibly with many retries" is SQS. "Notify multiple subscribers instantly" is SNS. "Route different event types to different targets based on content rules, including SaaS sources" is EventBridge. For a message larger than the 256 KB SQS maximum, use the SQS Extended Client, which stores the payload in S3 and passes a pointer, rather than splitting the message.

Worked example: An order pipeline must guarantee no order is lost even if the processor is down for an hour. Set the queue retention to 14 days and let Auto Scaling restart consumers; SQS durably retains the backlog and processing resumes with zero data loss once capacity returns.

Test Your Knowledge

A payments system must process transactions exactly once and in submission order per account, while still allowing different accounts to be processed in parallel. How should SQS be configured?

A
B
C
D
Test Your Knowledge

Messages that hit a code bug are being redelivered indefinitely, blocking healthy messages behind them. What is the correct remedy?

A
B
C
D
Test Your Knowledge

A team sees their SQS bill dominated by ReceiveMessage requests, most of which return no messages. Which change reduces cost without losing messages?

A
B
C
D
Test Your Knowledge

Producers send bursts far faster than a downstream database can absorb, occasionally overwhelming it. Which SQS-based design protects the database?

A
B
C
D