5.1 Modern Data Warehouse and Lakehouse

Key Takeaways

Every Azure analytics workload follows the same five stages: ingest, store, process, serve, and visualize.
Data warehouses use schema-on-write for fast BI; data lakes use schema-on-read for cheap raw storage; lakehouses combine both using Delta Lake on Parquet.
OneLake is the single, tenant-wide data lake automatically provisioned with every Microsoft Fabric tenant, storing all items as Delta-Parquet.
OneLake Shortcuts let Fabric query data in ADLS Gen2, Amazon S3, or Google Cloud Storage in place without copying it.
The medallion architecture organizes lakehouse data into bronze (raw), silver (cleansed), and gold (curated business-ready) zones.

Last updated: June 2026

Modern Data Warehouse and Lakehouse

Quick Answer: A modern data warehouse on Azure combines a data lake (cheap, schema-on-read storage for raw and refined files) with a SQL engine (structured, governed serving layer). The lakehouse pattern collapses both layers into one open storage tier — typically Delta Lake on Azure Data Lake Storage Gen2 or OneLake in Microsoft Fabric — so warehouse-style SQL, Spark notebooks, and BI tools all query the same files.

For the DP-900 exam, you need to recognize the five logical stages of any analytics platform, how the three common storage patterns differ, and where Microsoft's current services fit each stage.

The Five-Stage Reference Architecture

Microsoft documents every analytics workload as a pipeline of five logical stages. Knowing which Azure service plays which role is the single most useful mental model for the exam.

Stage	Purpose	Common Azure services (2026)
1. Ingest	Pull data from operational systems and streams	Azure Data Factory, Synapse Pipelines, Fabric Data Factory, Event Hubs, IoT Hub
2. Store	Land raw and curated data cheaply and durably	Azure Data Lake Storage Gen2, OneLake, Blob Storage
3. Process / Transform	Clean, join, aggregate, and model the data	Synapse Spark pools, Azure Databricks, Fabric Data Engineering, dataflows
4. Serve / Model	Expose governed, query-ready datasets	Synapse dedicated SQL pool, Fabric Warehouse, Fabric Lakehouse SQL endpoint
5. Visualize / Consume	Deliver insight to humans and apps	Power BI, Excel, custom apps via REST/ODBC

This same flow appears in Microsoft's reference architectures for the modern data warehouse, advanced analytics, and real-time analytics scenarios — only the service mix changes.

Three Storage Patterns You Must Distinguish

Data warehouse — A relational store optimized for analytical queries. Data is schema-on-write: it is cleaned, modeled into star or snowflake schemas, and loaded before it can be queried. Strong governance and BLAZING query speed on structured data, but loading is expensive and unstructured data (images, audio, logs) does not fit naturally.

Data lake — A flat folder hierarchy that stores files in any format (CSV, JSON, Parquet, images). It is schema-on-read: structure is applied only when something queries the file. Cheap, infinitely scalable, and friendly to data scientists, but without governance a lake degrades into a data swamp with duplicate, untrusted files.

Lakehouse — A single tier that gives you lake economics and warehouse semantics. It is built on an open table format — Delta Lake (the default in Microsoft Fabric and Databricks), Apache Iceberg, or Apache Hudi — that adds ACID transactions, time travel, schema enforcement, and indexing on top of Parquet files in the lake. SQL engines and Spark engines query the same files.

Capability	Warehouse	Lake	Lakehouse
Stores unstructured data	No	Yes	Yes
ACID transactions	Yes	No	Yes
Schema enforcement	Yes	No	Optional
BI tool friendly	Excellent	Poor	Good
Data science / ML friendly	Limited	Excellent	Excellent
Storage cost per TB	High	Low	Low
Open format (no vendor lock-in)	No	Yes	Yes

OneLake: One Lake for the Whole Tenant

OneLake is Microsoft Fabric's built-in, tenant-wide data lake. It is automatically provisioned with every Fabric tenant — there is exactly one OneLake per tenant, much like there is one OneDrive per user. Inside OneLake, data is organized into workspaces and then into items (lakehouses, warehouses, KQL databases). All items store data as Delta-Parquet files under the hood, so any Fabric engine — Spark, T-SQL, KQL, Power BI Direct Lake — can read the same physical data without copies.

Shortcuts let one OneLake folder point at data that already lives in Azure Data Lake Storage Gen2, Amazon S3, or Google Cloud Storage. The bytes are not moved or duplicated; Fabric queries them in place. This is what Microsoft means by the slogan "one copy of data for all your analytics."

The Medallion Architecture

Both lakehouses and warehouses commonly use a medallion layout to organize data quality zones:

Bronze (raw) — landed exactly as it arrived from the source. Append-only.
Silver (cleansed) — deduplicated, validated, type-corrected, conformed to common keys.
Gold (curated) — business-level aggregates, star schemas, ready for Power BI and reports.

You move data from bronze to silver to gold using Spark notebooks, dataflows, or stored procedures. Each layer is queryable, which makes lineage and debugging far easier than a monolithic ETL job.

When Each Pattern Wins

Choose a pure warehouse (Fabric Warehouse, Synapse dedicated SQL pool) when the workload is structured, the users are BI analysts, and predictable performance is more important than open formats.
Choose a pure data lake (ADLS Gen2 + Synapse serverless) when you have huge volumes of raw and semi-structured data, ad-hoc exploration dominates, and cost matters more than millisecond latency.
Choose a lakehouse (Fabric Lakehouse, Databricks) when you want one storage tier to serve both BI and data science teams, with ACID guarantees on Parquet.

Test Your Knowledge

An organization wants one storage tier that supports both Power BI reporting and data science notebooks, with ACID transactions on Parquet files and no vendor lock-in. Which pattern best fits?

Traditional data warehouse with star schemas

Raw data lake on ADLS Gen2 with CSV files

Lakehouse built on Delta Lake

Operational OLTP database with row-level locking

Test Your Knowledge

In a Microsoft Fabric medallion architecture, which layer is appropriate for a Power BI semantic model that drives an executive dashboard?

Bronze layer (raw, append-only files)

Silver layer (cleansed and conformed)

Gold layer (business-curated aggregates)

Any layer, because Direct Lake reads them identically

Up Next

5.2 Azure Synapse Analytics

Continue learning

Microsoft Azure Data Fundamentals

Azure DP-900

5.1 Modern Data Warehouse and Lakehouse

Key Takeaways

Modern Data Warehouse and Lakehouse

The Five-Stage Reference Architecture

Three Storage Patterns You Must Distinguish

OneLake: One Lake for the Whole Tenant

The Medallion Architecture

When Each Pattern Wins

Microsoft Azure Data Fundamentals

1Chapter 1: Introduction & Exam Overview

2Chapter 2: Core Data Concepts

3Chapter 3: Relational Data on Azure

4Chapter 4: Non-Relational Data on Azure

5Chapter 5: Analytics Workloads on Azure

Azure DP-900

5.1 Modern Data Warehouse and Lakehouse

Key Takeaways

Modern Data Warehouse and Lakehouse

The Five-Stage Reference Architecture

Three Storage Patterns You Must Distinguish

OneLake: One Lake for the Whole Tenant

The Medallion Architecture

When Each Pattern Wins