All Practice Exams

200+ Free AWS Data Engineer Practice Questions

Pass your AWS Certified Data Engineer – Associate (DEA-C01) exam on the first try — instant access, no signup required.

✓ No registration✓ No credit card✓ No hidden fees✓ Start practicing immediately
~65% Pass Rate
200+ Questions
100% Free
1 / 200
Question 1
Score: 0/0

A company needs to collect and process streaming data from IoT sensors at a rate of 10,000 records per second. The data needs to be analyzed in near real-time and stored in S3 for long-term analysis. Which AWS service combination is MOST appropriate?

A
B
C
D
to track
2026 Statistics

Key Facts: AWS Data Engineer Exam

~65%

Estimated Pass Rate

Industry estimate

720/1000

Passing Score

AWS (estimated)

80-120 hrs

Study Time

Recommended

34%

Largest Domain

Data Ingestion

65

Total Questions

50 scored + 15 unscored

$150

Exam Fee

AWS

The AWS Data Engineer Associate (DEA-C01) requires an estimated scaled score of 720/1000 to pass. The exam has 65 questions (50 scored + 15 unscored) in 170 minutes. Domain 1 (Data Ingestion and Transformation) is the largest at 34%, followed by Domain 2 (Data Store Management) at 26%, Domain 3 (Data Operations and Support) at 22%, and Domain 4 (Data Security and Governance) at 18%. The exam fee is $150.

Sample AWS Data Engineer Practice Questions

Try these sample questions to test your AWS Data Engineer exam readiness. Each question includes a detailed explanation. Start the interactive quiz above for the full 200+ question experience with AI tutoring.

1A company needs to collect and process streaming data from IoT sensors at a rate of 10,000 records per second. The data needs to be analyzed in near real-time and stored in S3 for long-term analysis. Which AWS service combination is MOST appropriate?
A.Amazon Kinesis Data Streams with AWS Lambda
B.Amazon Kinesis Data Firehose with Amazon S3
C.Amazon MSK with Amazon Redshift
D.AWS Data Pipeline with Amazon EC2
Explanation: Amazon Kinesis Data Firehose is the best choice for this scenario because it is designed to automatically capture, transform, and load streaming data into S3 without requiring custom code. It handles the scaling automatically and supports near real-time delivery. Kinesis Data Streams requires custom consumers (like Lambda or EC2), adding complexity. MSK (Managed Kafka) is overkill for this simple ingestion pattern. Data Pipeline is a legacy service primarily for batch processing, not streaming.
2Which Amazon Kinesis feature allows you to replay data from a specific point in time within the retention period?
A.Kinesis Data Firehose buffering
B.Kinesis Data Streams enhanced fan-out
C.Kinesis Data Streams record processor checkpointing
D.Kinesis Video Streams fragmentation
Explanation: Kinesis Data Streams supports replay capability through checkpointing with the Kinesis Client Library (KCL). Consumers can checkpoint their progress and restart from any checkpoint within the retention period (default 24 hours, up to 365 days). Enhanced fan-out improves read throughput but does not provide replay. Firehose is for delivery to destinations and does not support replay. Kinesis Video Streams is for video data, not general streaming.
3A data engineer needs to process clickstream data with multiple independent applications reading the same stream. Each application must read at its own pace without affecting others. Which Kinesis feature should be used?
A.Kinesis Data Streams with shared throughput
B.Kinesis Data Streams with enhanced fan-out
C.Kinesis Data Firehose with multiple destinations
D.Kinesis Analytics with multiple outputs
Explanation: Enhanced fan-out provides dedicated read throughput of 2 MB/sec per shard to each consumer, allowing multiple applications to read independently without competing for throughput. Standard shared throughput (option A) means consumers share the 2 MB/sec per shard. Firehose does not support multiple independent consumers reading at their own pace. Kinesis Analytics is for real-time SQL processing, not fan-out.
4What is the maximum retention period for data in Amazon Kinesis Data Streams?
A.24 hours
B.7 days
C.365 days
D.Indefinite
Explanation: Kinesis Data Streams supports a maximum retention period of 365 days (1 year). The default is 24 hours, but you can increase it to 7 days or up to 365 days for an additional charge. This long-term retention enables replay of historical data for new applications or reprocessing after bug fixes.
5A company needs to build an ETL pipeline that transforms data from S3 and loads it into Redshift. The transformation requires complex business logic written in Python. Which AWS service should be used?
A.AWS Data Pipeline with custom EC2 instances
B.AWS Glue with PySpark jobs
C.Amazon EMR with Hive scripts
D.AWS Lambda with Python runtime
Explanation: AWS Glue is a serverless ETL service that supports PySpark for complex transformations. It can read from S3, apply transformations, and write to Redshift without managing infrastructure. Data Pipeline is legacy and requires managing resources. EMR requires cluster management. Lambda has a 15-minute timeout limit, making it unsuitable for large ETL jobs.
6Which AWS Glue feature automatically discovers and catalogs metadata from data sources?
A.Glue ETL jobs
B.Glue crawlers
C.Glue DataBrew
D.Glue Studio
Explanation: AWS Glue crawlers automatically scan various data stores (S3, JDBC sources, etc.), extract metadata, and populate the AWS Glue Data Catalog with table definitions. ETL jobs are for data transformation, not discovery. DataBrew is for visual data preparation. Studio is a visual ETL authoring interface.
7A data engineer needs to transform JSON data to Parquet format and partition it by date for better query performance. Which AWS Glue feature is MOST suitable?
A.Glue crawler with custom classifiers
B.Glue ETL job with DynamicFrames
C.Glue DataBrew recipe
D.Glue Studio visual job
Explanation: AWS Glue ETL jobs with DynamicFrames provide built-in support for format conversion (JSON to Parquet) and partitioning. DynamicFrames handle schema evolution and provide easy methods for partitioning by columns like date. While Studio provides visual authoring, the actual transformation capability comes from the ETL job engine. DataBrew is for data cleaning, not format conversion.
8What is the purpose of AWS Glue bookmarks?
A.To save the code versions of ETL jobs
B.To track processed data and enable incremental processing
C.To mark tables for deletion in the Data Catalog
D.To create restore points for the Data Catalog
Explanation: AWS Glue job bookmarks track data that has already been processed during a previous run of an ETL job. This enables incremental processing, where subsequent job runs only process new or changed data. Bookmarks persist state across job runs, improving efficiency for large datasets.
9A company uses Amazon DynamoDB and needs to capture item-level changes to replicate data to an S3 data lake in near real-time. Which solution is MOST appropriate?
A.Enable DynamoDB Streams and use AWS Lambda to write to S3
B.Use DynamoDB DAX and export tables daily
C.Create a CloudWatch alarm on write capacity and trigger Lambda
D.Use AWS DMS with continuous replication
Explanation: DynamoDB Streams captures item-level modifications in near real-time. Combined with Lambda, it provides a serverless solution to process these changes and write to S3. DAX is for caching, not change capture. CloudWatch alarms are not granular enough for item-level changes. DMS works but adds unnecessary complexity when native streaming is available.
10Which Amazon MSK feature helps automatically provision and manage Apache Kafka clusters?
A.Kafka Connect
B.Serverless cluster type
C.Kafka Streams API
D.Schema Registry
Explanation: Amazon MSK Serverless automatically provisions and scales Kafka clusters based on workload, eliminating the need to manage server capacity. Kafka Connect is for data integration, not cluster management. Kafka Streams is a client library for building applications. Schema Registry manages Avro schemas, not infrastructure.

About the AWS Data Engineer Exam

The AWS Certified Data Engineer – Associate (DEA-C01) validates your technical expertise in implementing data pipelines, monitoring, troubleshooting, and optimizing cost and performance of data solutions using AWS services. This certification is ideal for data engineers, data architects, and analytics professionals who design and manage data infrastructure on AWS.

Questions

65 scored questions

Time Limit

2 hours 50 minutes

Passing Score

720/1000 (estimated)

Exam Fee

$150 (Amazon Web Services (AWS))

AWS Data Engineer Exam Content Outline

34%

Data Ingestion and Transformation

Kinesis, MSK, DMS, Glue, EMR, Lambda, Step Functions, Data Pipeline, batch and streaming ingestion, ETL transformation, data quality

26%

Data Store Management

S3, Redshift, Athena, DynamoDB, RDS, Lake Formation, Glue Data Catalog, OpenSearch, Neptune, data modeling, partitioning

22%

Data Operations and Support

CloudWatch, CloudTrail, monitoring, troubleshooting, cost optimization, performance tuning, backup, disaster recovery, high availability

18%

Data Security and Governance

IAM, KMS, Secrets Manager, Macie, Config, PrivateLink, encryption, compliance, data privacy, access control, audit logging

How to Pass the AWS Data Engineer Exam

What You Need to Know

  • Passing score: 720/1000 (estimated)
  • Exam length: 65 questions
  • Time limit: 2 hours 50 minutes
  • Exam fee: $150

Keys to Passing

  • Complete 500+ practice questions
  • Score 80%+ consistently before scheduling
  • Focus on highest-weighted sections
  • Use our AI tutor for tough concepts

AWS Data Engineer Study Tips from Top Performers

1Focus on Domain 1 (Data Ingestion, 34%) — it's the largest domain; master Kinesis (Streams vs Firehose), DMS, and Glue ETL
2Know data storage patterns: S3 storage classes and lifecycle, Redshift distribution styles and sort keys, DynamoDB partition and sort keys
3Understand orchestration options: Step Functions for workflow orchestration, MWAA for Airflow-based pipelines, EventBridge for event-driven
4Master data transformation: Glue DynamicFrames, PySpark, format conversion (JSON to Parquet), partitioning strategies
5Know monitoring and operations: CloudWatch metrics for Kinesis, Glue job monitoring, Redshift query monitoring, cost optimization techniques
6Understand security best practices: Encryption at rest (SSE-S3, SSE-KMS) and in transit, IAM policies for data access, Lake Formation permissions
7Study data pipeline patterns: CDC with DMS, streaming ingestion with Kinesis, batch ETL with Glue, incremental processing with job bookmarks
8Complete 200+ practice questions and score 80%+ consistently before scheduling the exam

Frequently Asked Questions

What is the AWS Data Engineer Associate pass rate?

The AWS Data Engineer Associate (DEA-C01) exam has an estimated pass rate of around 65%. AWS does not officially publish pass rates. You need an estimated scaled score of 720 out of 1000 to pass, with 65 questions (50 scored + 15 unscored) in 170 minutes. Most candidates with 1-2 years of hands-on AWS data engineering experience pass on their first attempt with thorough preparation.

How many questions are on the AWS Data Engineer Associate exam?

The DEA-C01 exam has 65 total questions: 50 scored questions and 15 unscored pretest questions. You have 170 minutes (2 hours 50 minutes) to complete the exam. Questions are either multiple choice (one correct answer) or multiple response (two or more correct answers). Approximately 60% of questions are scenario-based, presenting real-world data engineering challenges.

What are the four domains of the DEA-C01 exam?

The four exam domains are: Domain 1 – Data Ingestion and Transformation (34%): Kinesis, MSK, DMS, Glue, EMR, Lambda, Step Functions, batch and streaming ingestion, ETL transformation; Domain 2 – Data Store Management (26%): S3, Redshift, Athena, DynamoDB, RDS, Lake Formation, data modeling, partitioning; Domain 3 – Data Operations and Support (22%): CloudWatch, CloudTrail, monitoring, troubleshooting, cost optimization, backup, disaster recovery; Domain 4 – Data Security and Governance (18%): IAM, KMS, Macie, encryption, compliance, data privacy.

How long should I study for the AWS Data Engineer Associate exam?

Most candidates study for 6-10 weeks, investing 80-120 hours total. AWS recommends 2-3 years of data engineering experience with 1-2 years of hands-on AWS experience. Key study areas: 1) Data ingestion services (Kinesis, DMS, Glue). 2) Data storage services (S3, Redshift, DynamoDB). 3) ETL orchestration (Step Functions, MWAA). 4) Monitoring and operations (CloudWatch, CloudTrail). 5) Security and governance best practices. 6) Complete 200+ practice questions and score 80%+ on practice exams.

What AWS services are most important for the DEA-C01 exam?

Core services tested heavily: Data Ingestion (Kinesis Data Streams, Kinesis Firehose, DMS, Glue, EMR, Lambda); Data Storage (S3, Redshift, Athena, DynamoDB, RDS, Lake Formation); Orchestration (Step Functions, MWAA, EventBridge); Analytics (QuickSight, OpenSearch); Security (IAM, KMS, Macie, Secrets Manager); Monitoring (CloudWatch, CloudTrail). Understanding data pipeline architecture and when to use each service is critical.

What is the difference between Kinesis Data Streams and Kinesis Data Firehose?

Kinesis Data Streams is for real-time data streaming with custom consumers requiring custom code for data processing. It supports replay, allows multiple consumers, and provides per-shard ordering. Kinesis Data Firehose is a fully managed service for loading streaming data into destinations (S3, Redshift, Elasticsearch, Splunk) without custom code. Firehose handles automatic scaling, batching, compression, and format conversion. Use Streams for custom processing; use Firehose for simple delivery to supported destinations.

When should I use AWS Glue versus Amazon EMR?

Use AWS Glue for serverless ETL with minimal infrastructure management, especially for data cataloging, schema discovery, and Spark/Python-based transformations. Glue is ideal for simpler ETL workflows, data preparation, and integration with the Glue Data Catalog. Use Amazon EMR when you need full control over the cluster, support for specific Hadoop ecosystem tools, complex big data processing, machine learning with Spark MLlib, or when you need long-running clusters. EMR provides more flexibility but requires more management.

How does AWS Lake Formation work with S3 for data lakes?

AWS Lake Formation builds on S3 to provide centralized data lake management. It simplifies data ingestion, cataloging, cleaning, and transformation. Lake Formation provides fine-grained access control at database, table, column, and row levels across multiple analytics services (Athena, Redshift, EMR, QuickSight). It automates data cataloging with Glue crawlers, manages data permissions through a single interface, and enforces consistent security policies across your data lake without needing to configure S3 bucket policies for each service.