2.6 AWS Step Functions and Workflow Orchestration

Key Takeaways

  • AWS Step Functions orchestrate multi-step workflows using a visual state machine with built-in error handling, retries, and parallel execution.
  • Standard Workflows run for up to 1 year and guarantee exactly-once execution; Express Workflows run for up to 5 minutes with at-least-once execution.
  • Step Functions integrate with 200+ AWS services including Lambda, ECS, DynamoDB, SQS, SNS, and Glue.
  • Use Step Functions when you need to coordinate multiple services in sequence, parallel, or with branching logic — not just simple event-driven processing.
  • Built-in error handling with Retry and Catch states eliminates the need for custom retry logic in application code.
Last updated: March 2026

AWS Step Functions and Workflow Orchestration

Quick Answer: Step Functions orchestrate complex workflows as visual state machines. Use Standard Workflows for long-running processes (up to 1 year, exactly-once). Use Express Workflows for high-volume, short-duration processes (up to 5 minutes, at-least-once). Built-in error handling eliminates custom retry logic.

What Are Step Functions?

AWS Step Functions is a serverless orchestration service that lets you coordinate multiple AWS services into visual workflows called state machines.

State Types

StateDescriptionUse Case
TaskPerform work (invoke Lambda, call API)Process an order
ChoiceBranching logic (if/else)Route based on order type
ParallelExecute branches simultaneouslyProcess payment AND update inventory simultaneously
MapRun the same steps for each item in an arrayProcess each item in an order
WaitDelay for a specified timeWait 24 hours before sending follow-up
PassPass input to output (useful for data transformation)Format data between steps
Succeed/FailEnd execution successfully or with an errorMark workflow complete

Standard vs. Express Workflows

FeatureStandardExpress
Max duration1 year5 minutes
Execution modelExactly-onceAt-least-once
PricingPer state transitionPer execution + duration
Execution historyYes (visible in console)CloudWatch Logs only
Best forOrder processing, ETL, human approvalIoT data processing, streaming ETL, high-volume APIs
Max executions2,000/sec start rate100,000/sec start rate

Error Handling

Step Functions provide built-in error handling that eliminates custom retry logic:

  • Retry — Automatically retry failed states with configurable backoff
  • Catch — Handle errors and route to recovery states
  • Timeout — Define maximum execution time for each state
  • Heartbeat — Monitor long-running tasks

Integration Patterns

PatternDescription
Request ResponseCall service, wait for HTTP response, continue
Run a Job (.sync)Start a job (e.g., Glue, Batch), wait for completion
Wait for CallbackSend token, pause execution, resume when callback received

On the Exam: "Orchestrate a multi-step order processing workflow with error handling" → Step Functions. "Simple async message processing between two services" → SQS. Step Functions are for orchestration; SQS/SNS are for simple decoupling.

Test Your Knowledge

A workflow needs to process an order: validate payment, update inventory, send confirmation email, and handle failures at each step with retries. Which service is BEST suited?

A
B
C
D
Test Your Knowledge

Which Step Functions workflow type should you use for a long-running order approval process that may take several days to complete?

A
B
C
D