100+ Free Airflow DAG Authoring Practice Questions

Pass your Astronomer Certification: Apache Airflow DAG Authoring exam on the first try — instant access, no signup required.

✓ No registration✓ No credit card✓ No hidden fees✓ Start practicing immediately

~65-75% Pass Rate

100+ Questions

100% Free

Loading practice questions...

Same family resources

Explore More Astronomer (Apache Airflow) Certifications

Continue into nearby exams from the same family. Each card keeps practice questions, study guides, flashcards, videos, and articles in one place.

Astronomer Certification for Apache Airflow Fundamentals

Practice Questions100 questions

2026 Statistics

Key Facts: Airflow DAG Authoring Exam

Exam Questions

Astronomer

60 min

Exam Duration

Astronomer

75%

Passing Score

Astronomer

$150

Exam Fee

Astronomer

2 years

Validity

Astronomer

75 questions, 60 minutes, 75% passing score, $150 fee. Advanced DAG authoring topics: TaskFlow API (@task/@dag decorators), dynamic task mapping (.expand()), Datasets and data-aware scheduling, BranchPythonOperator, trigger rules (all_success, one_failed, none_failed), TaskGroups, on_failure_callback, SLA misses, execution_timeout. Certification valid 2 years.

Sample Airflow DAG Authoring Practice Questions

Try these sample questions to test your Airflow DAG Authoring exam readiness. Each question includes a detailed explanation. Start the interactive quiz above for the full 100+ question experience with AI tutoring.

1What is the primary advantage of using the @task decorator (TaskFlow API) over the classic PythonOperator?

A.The @task decorator allows tasks to run on separate Kubernetes pods automatically

B.The @task decorator eliminates boilerplate: return values automatically become XCom, and passing return values between @task functions creates implicit dependencies

C.The @task decorator runs tasks faster because it bypasses the metadata database

D.The @task decorator enables tasks to access global Python variables at parse time

Explanation: The TaskFlow API's @task decorator makes DAG authoring cleaner: return values are auto-pushed to XCom, passing a decorated function's return value as an argument to another creates an implicit upstream dependency, and there is no need for xcom_push/xcom_pull calls or explicit PythonOperator instantiation.

2In the following TaskFlow API snippet, what is the task dependency created? ```python @dag def my_dag(): data = extract() result = transform(data) load(result) ```

A.All three tasks run in parallel

B.extract → transform → load, with XCom values passed between them automatically

C.Only extract and load are connected; transform is independent

D.The dependencies must be explicitly set with >> operators as well

Explanation: In TaskFlow API, passing the return value of one @task function as an argument to another automatically creates both the task dependency and the XCom data flow. Here: extract() runs first, its return value is pushed to XCom and passed to transform(), which runs second, and its return value is passed to load().

3How do you mix a classic BashOperator with a TaskFlow API @task function and create a dependency where the @task runs after the BashOperator?

A.It is not possible to mix classic operators with TaskFlow API in the same DAG

B.Use the >> operator: bash_task >> taskflow_func()

C.Wrap the BashOperator in a @task decorator to make it compatible

D.Use the depends_on=[bash_task] parameter in the @task decorator

Explanation: Classic operators and TaskFlow @task functions can coexist in the same DAG. Calling a @task-decorated function returns a TaskInstance-like XComArg object. You can create dependencies with >> between classic operators and TaskFlow tasks: bash_task >> my_taskflow_function(). The XComArg from the function can also be passed to other tasks.

4What does the @task.branch decorator (TaskFlow API) return, and what effect does it have on downstream tasks?

A.It returns a boolean; True continues all downstream tasks, False skips them

B.It returns a task_id string (or list of task_ids); only those tasks run, the rest are marked skipped

C.It returns an XCom value that BranchPythonOperator reads to decide the branch

D.It returns a DAG run configuration that changes the schedule

Explanation: @task.branch is the TaskFlow equivalent of BranchPythonOperator. The decorated function must return a task_id string or list of task_ids that should run. All other downstream tasks are marked as skipped. This allows conditional branching with the cleaner TaskFlow syntax.

5A DAG uses BranchPythonOperator to choose between branch_a and branch_b. A final 'notify' task must run regardless of which branch executes. What trigger_rule should 'notify' have?

A.all_success (default)

B.all_done

C.none_failed_min_one_success

D.none_failed

Explanation: After BranchPythonOperator, the unchosen branch tasks are marked 'skipped'. With the default all_success, 'notify' would be blocked by the skipped tasks. trigger_rule='none_failed' allows 'notify' to run as long as no upstream tasks failed — skipped tasks do not count as failures. This is the standard join pattern after branching.

6What is the Airflow ShortCircuitOperator and when does it skip downstream tasks?

A.It always skips all downstream tasks regardless of condition

B.It evaluates a Python callable; if it returns False, all downstream tasks are skipped

C.It skips tasks that exceed their execution_timeout

D.It is a deprecated alias for BranchPythonOperator

Explanation: ShortCircuitOperator evaluates a Python callable that returns a boolean. If True, the DAG continues normally. If False, all downstream tasks are short-circuited (marked as 'skipped') and the pipeline stops at that point. Unlike BranchPythonOperator, there is no 'chosen branch' — it's binary continue/stop.

7What does the .expand() method on an Airflow operator do?

A.Expands the operator's timeout to accommodate longer running tasks

B.Creates multiple mapped task instances at runtime based on a list of inputs

C.Expands the operator class to inherit from multiple parent operators

D.Generates multiple DAG runs for different time ranges

Explanation: .expand() is Airflow's dynamic task mapping feature. Instead of creating tasks in a Python loop at parse time, .expand() receives an iterable and creates one task instance per element at runtime. For example, process_file.expand(filename=files_list) creates one task per file, with the count determined when the DAG runs.

8You need to process a dynamic list of S3 files, but all tasks should use the same S3 connection ID. How do you use .partial() with .expand()?

A.process_file.expand(filename=files, aws_conn_id='my_conn')

B.process_file.partial(aws_conn_id='my_conn').expand(filename=files)

C.process_file.expand(filename=files).partial(aws_conn_id='my_conn')

D.process_file.expand(filename=files, static=dict(aws_conn_id='my_conn'))

Explanation: .partial() sets fixed (non-expanding) parameters that are the same across all mapped instances, while .expand() sets the parameter that varies per instance. The correct call chain is operator.partial(fixed_params).expand(varying_param=list). This separates static config from dynamic inputs.

9What is an Airflow Dataset, and how does it enable data-aware scheduling?

A.A Dataset is a table in the Airflow metadata database that stores processed output data

B.A Dataset is a logical URI representing a data asset; DAGs can be scheduled to run when a dataset is updated by a producer DAG

C.A Dataset is an Airflow Variable that tracks the last modified time of a file

D.A Dataset is a special Sensor that monitors S3 for new files

Explanation: Datasets (introduced in Airflow 2.4) are logical identifiers for data assets (e.g., Dataset('s3://my-bucket/data')). Producer DAGs mark their outputs with outlets=[my_dataset]. Consumer DAGs use schedule=[my_dataset] to run automatically when the dataset is updated. This enables event-driven, data-aware pipeline orchestration.

10How does a producer DAG mark its output as a Dataset update in Airflow?

A.By calling Dataset.publish(uri='s3://bucket/data') at the end of the last task

B.By setting outlets=[Dataset('s3://bucket/data')] on the task that produces the data

C.By writing the dataset URI to an Airflow Variable named 'datasets_updated'

D.By adding schedule=[Dataset('...')] to both the producer and consumer DAG

Explanation: To mark a dataset as updated, set outlets=[Dataset('uri')] on the operator that produces the data. When that task completes successfully, Airflow records the dataset update. Consumer DAGs with schedule=[Dataset('uri')] will be triggered by this event. Only the consumer uses schedule=[Dataset(...)]; the producer uses outlets.

About the Airflow DAG Authoring Exam

The Astronomer Certification: Apache Airflow DAG Authoring validates advanced skills in designing, authoring, and optimizing Apache Airflow DAGs for production use. It covers the TaskFlow API (@task decorator), dynamic task mapping (.expand()), Datasets for data-aware scheduling, BranchPythonOperator, trigger rules, TaskGroups, error handling callbacks, and production DAG best practices.

Questions

75 scored questions

Time Limit

60 minutes

Passing Score

75% (56/75)

Exam Fee

$150 (Astronomer)

Airflow DAG Authoring Exam Content Outline

20%

TaskFlow API and Decorators

@task and @dag decorators, automatic XCom via return values, dependency inference, mixing TaskFlow with classic operators, @task.branch

20%

Operators and Trigger Rules

Sensor modes (poke vs reschedule), trigger rules: all_success (default), one_failed, all_failed, all_done, none_failed, none_skipped, one_success

20%

Branching and Dynamic Task Mapping

BranchPythonOperator, ShortCircuitOperator, .expand() for fan-out tasks, .partial().expand() for mixed parameters

15%

Datasets and Data-Aware Scheduling

Dataset definition and URIs, outlets in operators, schedule=[Dataset(...)], consumer DAG triggering, Dataset UI

10%

TaskGroups

TaskGroup context manager, nested TaskGroups, dependencies between TaskGroups, UI visualization

15%

Error Handling and SLAs

on_failure_callback, on_retry_callback, on_success_callback, SLA miss callbacks, retries, retry_delay, retry_exponential_backoff, execution_timeout

How to Pass the Airflow DAG Authoring Exam

What You Need to Know

Passing score: 75% (56/75)
Exam length: 75 questions
Time limit: 60 minutes
Exam fee: $150

Keys to Passing

Complete 500+ practice questions
Score 80%+ consistently before scheduling
Focus on highest-weighted sections
Use our AI tutor for tough concepts

Airflow DAG Authoring Study Tips from Top Performers

1Write TaskFlow API DAGs from scratch — practice the @task decorator and how return values create XCom and dependencies

2Know ALL trigger rules and the use case for each, especially none_failed and none_skipped

3Understand .expand() deeply: how it creates mapped task instances and how to use .partial() for fixed params

4Learn Dataset URIs and the difference between a producer (outlets=[]) and a consumer (schedule=[Dataset(...)])

5Know when to choose poke vs reschedule sensor mode

6Understand TaskGroup vs SubDAG — SubDAGs are deprecated; TaskGroups are the modern approach

7Practice writing on_failure_callback functions and understand when they are called vs retry callbacks

Frequently Asked Questions

What is the @task decorator in Airflow's TaskFlow API?

The @task decorator transforms a regular Python function into an Airflow task. The decorated function's return value is automatically pushed to XCom, and passing the return value of one @task function into another automatically creates a task dependency. This eliminates boilerplate PythonOperator instantiation and explicit xcom_push/xcom_pull calls.

When should I use trigger_rule='none_failed'?

Use trigger_rule='none_failed' when a downstream task should run as long as all upstream tasks either succeeded or were skipped — but not if any failed. This is commonly used after BranchPythonOperator branches to run a join task regardless of which branch executed, without being blocked by skipped branch tasks.

What is the difference between .expand() and creating tasks in a loop?

.expand() creates tasks at runtime based on a dynamic list, meaning the number of tasks is only known when the DAG runs. Creating tasks in a Python loop creates them at DAG parse time, which is static. .expand() is preferred for dynamically-sized input lists and enables better task-level tracking in the Airflow UI.

What is the poke vs reschedule mode for sensors?

In poke mode, a sensor holds a worker slot while actively polling at poke_interval. In reschedule mode, the sensor releases its worker slot between checks, freeing resources. reschedule mode is preferred for long-running sensors (hours) to avoid exhausting the worker pool.