3.3 Choose the Right Store: Lakehouse vs Warehouse vs Eventhouse

Key Takeaways

  • Lakehouse: open Delta/Parquet files in OneLake, edited primarily with Spark/notebooks, with a read-only SQL analytics endpoint for T-SQL queries.
  • Warehouse: a fully relational, T-SQL read-write store supporting INSERT/UPDATE/DELETE, multi-table transactions, views, and stored procedures.
  • Eventhouse: a KQL-based store optimized for high-volume streaming, telemetry, log, and time-series analytics, queried with KQL.
  • The SQL analytics endpoint over a lakehouse is read-only; if a scenario needs T-SQL writes, multi-table transactions, or stored procedures, choose a warehouse.
  • Match the store to the workload and the dominant language: Spark/open files to lakehouse, relational T-SQL to warehouse, real-time KQL telemetry to eventhouse.
Last updated: May 2026

The Most-Tested Decision on the Exam

Expect multiple scenario questions that hand you a workload and ask which Fabric store to use. The reliable mental model: what does the data look like, who edits it, and which language dominates?

  • Lakehouse — open Delta Lake/Parquet tables and files in OneLake. Engineers transform with Spark/notebooks and Dataflow Gen2. Every lakehouse automatically exposes a SQL analytics endpoint that is read-only T-SQL over those tables. Choose it for open-format data, data engineering, ML feature prep, and unstructured/semi-structured files.
  • Warehouse — a fully relational, read-write T-SQL store. Supports INSERT/UPDATE/DELETE, multi-table transactions, views, and stored procedures. Choose it for classic relational analytics, dimensional modeling owned by SQL developers, and workloads needing T-SQL DML.
  • Eventhouse — built on the KQL engine; optimized for high-volume streaming, telemetry, logs, IoT, and time-series. Queried with KQL (and a SQL surface for some operations). Choose it whenever the scenario stresses real-time ingestion and event/time-series analytics.

Decision Table

CriterionLakehouseWarehouseEventhouse
Data shapeOpen Delta/Parquet, filesRelational tablesStreaming events, time-series
Primary languageSpark (PySpark/Spark SQL)T-SQLKQL
Write accessSpark writes; SQL endpoint read-onlyFull T-SQL read-write DMLStreaming ingest; KQL
Multi-table transactionsNo (Delta table-level)YesNo
Stored proceduresNo (notebooks instead)YesKQL functions
Best forData engineering, ML, open filesRelational BI, SQL DMLReal-time/telemetry analytics
Typical personaData engineerSQL/BI developerReal-time analyst
Loading diagram...
Store selection flow

The Read-Only Endpoint Trap

The most common distractor on this topic: assuming the lakehouse SQL analytics endpoint can run T-SQL writes. It cannot — it is read-only over the Delta tables. To modify lakehouse data you use Spark, notebooks, or Dataflow Gen2. If the scenario requires UPDATE/DELETE, multi-table transactions, or stored procedures executed in T-SQL, the correct store is a warehouse.

A second trap: choosing a warehouse for streaming telemetry because it is "a database." High-ingest event and time-series workloads belong in an eventhouse, where the KQL engine and ingestion pipeline are optimized for that pattern. And a warehouse is not an eventhouse: T-SQL relational analytics vs. KQL telemetry are different engines for different jobs.

Cross-Store Reality with OneLake

Because lakehouses and warehouses both store data as Delta in OneLake, you can often query across them and reuse data with shortcuts instead of duplicating it. Still, the exam wants the primary store chosen by the dominant workload and language — not "all three." Read the scenario for the load pattern (batch files vs. relational DML vs. streaming) and the persona doing the work.

Test Your Knowledge

A finance team must run nightly relational ETL that performs multi-table transactions, UPDATE and DELETE statements, views, and stored procedures, all in T-SQL, owned by SQL developers. Which Fabric store should they use?

A
B
C
D
Test Your Knowledge

An IoT platform streams millions of device telemetry records per hour. Analysts need to run time-series and log-pattern queries with low latency and want to use a query language designed for this kind of data. Which store and language fit best?

A
B
C
D