3.3 Choose the Right Store: Lakehouse vs Warehouse vs Eventhouse
Key Takeaways
- Lakehouse: open Delta/Parquet files in OneLake, edited primarily with Spark/notebooks, with a read-only SQL analytics endpoint for T-SQL queries.
- Warehouse: a fully relational, T-SQL read-write store supporting INSERT/UPDATE/DELETE, multi-table transactions, views, and stored procedures.
- Eventhouse: a KQL-based store optimized for high-volume streaming, telemetry, log, and time-series analytics, queried with KQL.
- The SQL analytics endpoint over a lakehouse is read-only; if a scenario needs T-SQL writes, multi-table transactions, or stored procedures, choose a warehouse.
- Match the store to the workload and the dominant language: Spark/open files to lakehouse, relational T-SQL to warehouse, real-time KQL telemetry to eventhouse.
The Most-Tested Decision on the Exam
Expect multiple scenario questions that hand you a workload and ask which Fabric store to use. The reliable mental model: what does the data look like, who edits it, and which language dominates?
- Lakehouse — open Delta Lake/Parquet tables and files in OneLake. Engineers transform with Spark/notebooks and Dataflow Gen2. Every lakehouse automatically exposes a SQL analytics endpoint that is read-only T-SQL over those tables. Choose it for open-format data, data engineering, ML feature prep, and unstructured/semi-structured files.
- Warehouse — a fully relational, read-write T-SQL store. Supports
INSERT/UPDATE/DELETE, multi-table transactions, views, and stored procedures. Choose it for classic relational analytics, dimensional modeling owned by SQL developers, and workloads needing T-SQL DML. - Eventhouse — built on the KQL engine; optimized for high-volume streaming, telemetry, logs, IoT, and time-series. Queried with KQL (and a SQL surface for some operations). Choose it whenever the scenario stresses real-time ingestion and event/time-series analytics.
Decision Table
| Criterion | Lakehouse | Warehouse | Eventhouse |
|---|---|---|---|
| Data shape | Open Delta/Parquet, files | Relational tables | Streaming events, time-series |
| Primary language | Spark (PySpark/Spark SQL) | T-SQL | KQL |
| Write access | Spark writes; SQL endpoint read-only | Full T-SQL read-write DML | Streaming ingest; KQL |
| Multi-table transactions | No (Delta table-level) | Yes | No |
| Stored procedures | No (notebooks instead) | Yes | KQL functions |
| Best for | Data engineering, ML, open files | Relational BI, SQL DML | Real-time/telemetry analytics |
| Typical persona | Data engineer | SQL/BI developer | Real-time analyst |
The Read-Only Endpoint Trap
The most common distractor on this topic: assuming the lakehouse SQL analytics endpoint can run T-SQL writes. It cannot — it is read-only over the Delta tables. To modify lakehouse data you use Spark, notebooks, or Dataflow Gen2. If the scenario requires UPDATE/DELETE, multi-table transactions, or stored procedures executed in T-SQL, the correct store is a warehouse.
A second trap: choosing a warehouse for streaming telemetry because it is "a database." High-ingest event and time-series workloads belong in an eventhouse, where the KQL engine and ingestion pipeline are optimized for that pattern. And a warehouse is not an eventhouse: T-SQL relational analytics vs. KQL telemetry are different engines for different jobs.
Cross-Store Reality with OneLake
Because lakehouses and warehouses both store data as Delta in OneLake, you can often query across them and reuse data with shortcuts instead of duplicating it. Still, the exam wants the primary store chosen by the dominant workload and language — not "all three." Read the scenario for the load pattern (batch files vs. relational DML vs. streaming) and the persona doing the work.
A finance team must run nightly relational ETL that performs multi-table transactions, UPDATE and DELETE statements, views, and stored procedures, all in T-SQL, owned by SQL developers. Which Fabric store should they use?
An IoT platform streams millions of device telemetry records per hour. Analysts need to run time-series and log-pattern queries with low latency and want to use a query language designed for this kind of data. Which store and language fit best?