Career upgrade: Learn practical AI skills for better jobs and higher pay.
Level up
All Practice Exams

100+ Free ClickHouse Certified Practice Questions

Pass your ClickHouse Certified Developer exam on the first try — instant access, no signup required.

✓ No registration✓ No credit card✓ No hidden fees✓ Start practicing immediately
100+ Questions
100% Free
1 / 100
Question 1
Score: 0/0

Which ClickHouse sampling clause allows you to query a deterministic fraction of data for approximate query processing?

A
B
C
D
to track
2026 Statistics

Key Facts: ClickHouse Certified Exam

70%

Passing Score

ClickHouse

$200

Exam Fee

ClickHouse

2 hours

Exam Duration

ClickHouse

10-12

Hands-on Tasks

ClickHouse

365 days

Attempt Validity

ClickHouse

HackerRank

Exam Platform

ClickHouse

The ClickHouse Certified Developer is the first official certification from ClickHouse, the world's fastest open-source OLAP database. The exam is a 2-hour hands-on assessment of 10-12 real-world tasks on ClickHouse clusters, delivered via HackerRank. Topics span MergeTree engine design, query optimization, Kafka/S3 ingestion, replication with ClickHouse Keeper, and advanced analytics patterns.

Sample ClickHouse Certified Practice Questions

Try these sample questions to test your ClickHouse Certified exam readiness. Each question includes a detailed explanation. Start the interactive quiz above for the full 100+ question experience with AI tutoring.

1Which storage model does ClickHouse use to achieve high analytical query performance on large datasets?
A.Row-oriented storage with B-tree indexes
B.Columnar storage with vectorized execution
C.Document storage with JSON compression
D.Wide-column storage with LSM trees
Explanation: ClickHouse uses columnar storage, meaning each column is stored separately on disk. This allows analytical queries that read only a subset of columns to avoid loading irrelevant data. Combined with vectorized execution (processing data in SIMD batches), this achieves extremely high throughput for OLAP workloads. Row-oriented storage is typical of OLTP databases like PostgreSQL, which are optimized for retrieving entire rows.
2In ClickHouse, what is a 'granule' in the context of MergeTree storage?
A.A single row in a MergeTree data part
B.A block of rows (default 8,192) that forms the unit of index granularity
C.A compressed block of 65,536 rows written to disk at once
D.A shard partition containing data for one server
Explanation: A granule is the fundamental unit of index granularity in MergeTree. By default, index_granularity = 8192, meaning one entry in the sparse primary index is written for every 8,192 rows. When ClickHouse reads data, it identifies which granules might contain matching rows and reads only those granule blocks from disk. This enables efficient data skipping without reading the entire column.
3What is the relationship between the ORDER BY clause and the PRIMARY KEY in a MergeTree table when PRIMARY KEY is not explicitly specified?
A.The primary key defaults to the first column regardless of ORDER BY
B.The primary key is automatically set equal to the ORDER BY expression
C.A table without an explicit PRIMARY KEY has no sparse index
D.PRIMARY KEY must always be specified separately from ORDER BY
Explanation: When you define a MergeTree table with ORDER BY but omit PRIMARY KEY, ClickHouse automatically uses the ORDER BY columns as both the sort key and the primary key. The ORDER BY defines the physical sort order of the data on disk, and the sparse primary index is built on those same columns. You can specify a PRIMARY KEY that is a prefix of ORDER BY when you want a smaller index (fewer columns in the index) while retaining a longer sort key.
4A MergeTree table is defined with ORDER BY (tenant_id, event_date, user_id). A query filters only on user_id. What does ClickHouse do with the primary index for this query?
A.Uses the primary index efficiently since user_id is part of the key
B.Cannot use the primary index at all because user_id is not the first key column
C.Automatically reorders the index to match the query filter
D.Performs a hash join against the index to find matching user_ids
Explanation: ClickHouse primary indexes are sparse and sorted by the full ORDER BY expression. Filtering on a non-leading column (user_id here) cannot prune granules via the primary index because the index entries are sorted by tenant_id first, then event_date, then user_id. For such queries, ClickHouse must perform a full scan of all granules. Data skipping indexes (like a bloom_filter on user_id) or projections can address this limitation.
5What is the purpose of PARTITION BY in a MergeTree table?
A.It defines the sort order of data within each data part
B.It splits data into separate directories, enabling partition pruning for WHERE filters
C.It determines which shard a row is routed to in a distributed cluster
D.It sets the primary key prefix used for the sparse index
Explanation: PARTITION BY physically separates data into distinct directories (partitions) on disk. When a WHERE clause contains a condition on the partition key, ClickHouse can skip entire partitions without reading them. This is partition pruning. A common pattern is PARTITION BY toYYYYMM(event_date), which creates one partition per month. Note: very high cardinality PARTITION BY expressions create too many small parts and degrade performance.
6Which ClickHouse MergeTree engine variant removes duplicate rows with the same sorting key during background merge operations?
A.SummingMergeTree
B.CollapsingMergeTree
C.ReplacingMergeTree
D.AggregatingMergeTree
Explanation: ReplacingMergeTree deduplicates rows with the same ORDER BY (sorting key) during background merges. If a version column is specified, it retains the row with the highest version; otherwise it keeps the last inserted row. Because deduplication happens only during merges (which are asynchronous), you must use FINAL or deduplicate at query time to guarantee reading only deduplicated data.
7When using ReplacingMergeTree, a SELECT query returns duplicate rows even though they should have been removed. What is the most likely explanation?
A.The version column was not specified correctly
B.Background merges have not yet run, so deduplication has not occurred
C.ReplacingMergeTree requires a FINAL keyword in the CREATE TABLE statement
D.Duplicates can only be removed with an explicit OPTIMIZE TABLE command
Explanation: ReplacingMergeTree deduplication happens asynchronously during background merge operations. At any given moment, a table may have multiple data parts containing rows with the same sorting key. Until those parts are merged (either by background processes or by running OPTIMIZE TABLE FINAL), duplicate rows remain visible in SELECT queries. To force deduplicated results at query time, use SELECT ... FROM table FINAL.
8SummingMergeTree collapses rows with the same sorting key by summing numeric columns. What happens to non-numeric columns that are not part of the sorting key during a merge?
A.They are averaged across collapsed rows
B.An arbitrary value is kept (typically from the first row in the merge)
C.Non-numeric columns cause the merge to fail with an error
D.They are concatenated into an array
Explanation: For non-numeric columns (strings, dates, etc.) that are not part of the PRIMARY KEY/ORDER BY, SummingMergeTree keeps an arbitrary value — typically from one of the rows being merged (implementation may vary). This is by design: SummingMergeTree is intended for metric/counter data where only numeric aggregations matter. If non-numeric column values are important, consider a different table engine or a materialized view.
9AggregatingMergeTree stores aggregate function states in columns. Which ClickHouse data type is used to store such a state?
A.Nested(state String)
B.AggregateFunction(func_name, arg_types...)
C.Array(Float64)
D.Tuple(count UInt64, sum Float64)
Explanation: The AggregateFunction(func_name, arg_types...) data type is used to store the intermediate (partial) state of an aggregate function. For example, AggregateFunction(uniq, UInt64) stores the HyperLogLog sketch for counting distinct UInt64 values. These states are merged using the -Merge combinator (e.g., uniqMerge) and inserted using the -State combinator (e.g., uniqState). This pattern is central to AggregatingMergeTree and materialized view pre-aggregation.
10CollapsingMergeTree uses a sign column with values +1 and -1. What does inserting a row with sign = -1 accomplish?
A.Marks the row for deletion in the next OPTIMIZE TABLE run
B.Cancels out a previously inserted row with sign = +1 that has the same sorting key
C.Marks the row as a soft delete visible only with SHOW DELETED ROWS
D.Decrements a counter column by the numeric value of the row
Explanation: CollapsingMergeTree implements mutable data through row cancellation. To 'delete' or 'update' a row: first insert a cancellation row with the same sorting key values and sign = -1, then (for updates) insert the new state with sign = +1. During background merges, pairs of +1 and -1 rows with the same sorting key cancel each other out. This allows efficient mutable state without expensive in-place updates.

About the ClickHouse Certified Exam

The ClickHouse Certified Developer certification validates expertise in ClickHouse including table engine selection, data modeling, query optimization, materialized views, replication, sharding, and performance tuning on real-world cluster tasks.

Questions

100 scored questions

Time Limit

2 hours

Passing Score

70%

Exam Fee

$200 (ClickHouse)

ClickHouse Certified Exam Content Outline

25%

Table Engines & Data Modeling

MergeTree family, ORDER BY, PRIMARY KEY, PARTITION BY, TTL, and denormalization

25%

Query Optimization

Sparse indexes, skip indexes, PREWHERE, projections, materialized views, and sampling

20%

Data Ingestion & Formats

INSERT patterns, Kafka engine, S3 table function/engine, bulk loading, and async inserts

15%

Replication & Sharding

ReplicatedMergeTree, ClickHouse Keeper, Distributed tables, and cluster operations

15%

Aggregate Functions & Analytics

Aggregate functions, combinators, dictionaries, TTL, and advanced SQL features

How to Pass the ClickHouse Certified Exam

What You Need to Know

  • Passing score: 70%
  • Exam length: 100 questions
  • Time limit: 2 hours
  • Exam fee: $200

Keys to Passing

  • Complete 500+ practice questions
  • Score 80%+ consistently before scheduling
  • Focus on highest-weighted sections
  • Use our AI tutor for tough concepts

ClickHouse Certified Study Tips from Top Performers

1Set up a free ClickHouse Cloud account or local Docker instance and practice all MergeTree engine variants (ReplacingMergeTree, SummingMergeTree, AggregatingMergeTree, CollapsingMergeTree)
2Master the ORDER BY / PRIMARY KEY / PARTITION BY relationship — this is central to most exam tasks
3Build a materialized view pipeline: Kafka engine → MergeTree destination → AggregatingMergeTree summary
4Practice query optimization: add skip indexes, write PREWHERE explicitly, and use EXPLAIN PIPELINE to understand execution
5Understand the Distributed + ReplicatedMergeTree architecture and how sharding_key affects data distribution

Frequently Asked Questions

What format is the ClickHouse Certified Developer exam?

The exam is a hands-on performance-based assessment with 10-12 real-world tasks to complete in 2 hours. You work directly on ClickHouse clusters via the HackerRank platform on your own computer. There are no multiple-choice questions — tasks require writing actual ClickHouse SQL and DDL. The passing score is 70%, with results taking up to 5 business days.

How much does the ClickHouse Certified Developer exam cost?

Each exam attempt costs $200 USD. The attempt is valid for 365 days from purchase. ClickHouse does not publish retake waiting periods, so candidates can repurchase and reattempt as needed.

What training is recommended before the ClickHouse Certified Developer exam?

ClickHouse recommends the free 'Real-time Analytics with ClickHouse' training course, available on-demand or instructor-led at clickhouse.com/learn. The course covers all exam objectives including table engines, data ingestion, query optimization, and cluster operations.

Do these practice questions match the actual hands-on exam format?

The actual exam requires performing tasks on live ClickHouse clusters. These practice questions are knowledge-based multiple-choice questions that test the same concepts, syntax, and decision-making skills tested by the hands-on tasks. Mastering these questions builds the understanding needed to successfully complete the real hands-on tasks.

Which ClickHouse topics are most important for the certification?

Focus on MergeTree engine selection and configuration (ORDER BY, PRIMARY KEY, PARTITION BY), materialized views for pre-aggregation, query optimization (PREWHERE, skip indexes, projections), aggregate functions and combinators, and Kafka/S3 data ingestion. Hands-on practice with real ClickHouse clusters is essential given the exam's performance-based format.