2.4 Analytical Workloads (OLAP) and Architectures

Key Takeaways

OLAP workloads scan large historical datasets, return aggregated results, and use denormalized schemas optimized for read performance.
A modern data warehouse pattern combines a data lake for raw data with a structured warehouse for curated data, queried by Power BI.
The lakehouse architecture (Microsoft Fabric, Azure Databricks) unifies lake-style storage with warehouse-style SQL semantics on open formats like Delta and Parquet.
The medallion architecture organizes lake data into bronze (raw), silver (cleaned/conformed), and gold (business-ready aggregates) layers.
ETL transforms data before loading it into a warehouse; ELT loads first and transforms inside the destination using cloud compute, which is the dominant Azure pattern.

Last updated: June 2026

Online Analytical Processing (OLAP) is the workload that helps you understand the business over time: revenue by region by quarter, customer churn, sales attribution. It looks nothing like OLTP at the storage layer, even when the source data starts in an OLTP system.

OLAP Characteristics

Aspect	OLAP profile
Query shape	Aggregations (SUM, COUNT, AVG) over many rows
Latency target	Seconds to minutes is acceptable
Concurrency	Tens to hundreds of analysts, not thousands
Schema	Denormalized — star or snowflake
Read/write mix	Mostly reads, periodic bulk loads
Freshness	Historical; minutes to hours behind the source is usually fine

In Azure, OLAP runs on Azure Synapse Analytics dedicated SQL pools, Microsoft Fabric warehouses and lakehouses, Azure Databricks, and Azure Analysis Services.

Modern Data Warehouse Architecture

Microsoft's reference architecture for modern analytics has four stages:

Sources  →  Ingestion  →  Storage  →  Model/Serve  →  Consumption

Sources — operational databases, SaaS apps, files, streams.
Ingestion — Azure Data Factory pipelines, Synapse pipelines, Fabric Data Factory, Event Hubs for streams.
Storage — Azure Data Lake Storage Gen2 for raw and semi-curated data, plus a warehouse for curated, modeled data.
Model/Serve — star-schema models in Synapse, Fabric, or Analysis Services; semantic models in Power BI.
Consumption — Power BI dashboards, ad-hoc SQL, ML training pipelines.

Data Lake

A data lake is a low-cost, schema-on-read store that holds data in its native format — CSV, JSON, Parquet, Avro, images, audio. Azure Data Lake Storage Gen2 sits on top of Azure Blob Storage and adds a hierarchical namespace and POSIX-style ACLs, which makes lake workloads efficient.

The lake is the landing zone. Raw data lives here cheaply, and you only pay to refine it when there is a known business need.

Lakehouse

A lakehouse unifies the cheap, flexible storage of a data lake with the SQL semantics and reliability of a data warehouse. It is built on open table formats — Delta Lake is the dominant one in Microsoft Fabric and Azure Databricks — which add ACID transactions, time travel, and schema enforcement on top of Parquet files in the lake.

A Microsoft Fabric lakehouse stores data as Delta tables in OneLake and exposes both a SQL endpoint (for warehouse-style queries) and a Spark endpoint (for engineering and data science). One copy of data, many engines.

Medallion Architecture

Microsoft documentation, Fabric, and Databricks all use the medallion architecture to organize lakehouse data into three quality tiers.

Layer	Also called	Contents	Typical consumer
Bronze	Raw	Exact copy of source data, including duplicates and bad rows	Data engineers, audit
Silver	Cleansed, conformed	Deduplicated, type-cast, validated; often joined across sources	Data engineers, advanced analysts
Gold	Curated, business-ready	Aggregates and dimensional models used by reports	Analysts, Power BI, business users

The medallion pattern lets you reprocess from any layer if business rules change downstream, because the raw bronze copy is always retained.

ETL vs ELT

Both patterns move data from sources to a destination; they differ in where the transform happens.

ETL — Extract, Transform, Load. Pull data from sources, transform it in a dedicated engine (such as an SSIS server or Data Factory mapping data flows), then load the result. This was standard when warehouse storage was expensive.
ELT — Extract, Load, Transform. Pull data from sources, land it raw in the lake or warehouse, then transform inside the destination using its native engine (T-SQL, Spark, Fabric notebooks). Cloud compute is elastic and storage is cheap, so ELT is the dominant Azure pattern today.

Azure Data Factory and Fabric Data Factory both support either pattern; the choice depends on whether transforms run in mapping data flows (ETL) or as T-SQL/Spark inside the warehouse (ELT).

Dimensional Modeling: Star and Snowflake Schemas

OLAP storage is denormalized. The two canonical models are the star schema and the snowflake schema.

Star Schema

A single central fact table (sales, page views, claims) holds measurable events at a chosen grain. It is surrounded by dimension tables (date, product, customer, store) that hold descriptive attributes. Joins go one level deep — the fact joins directly to each dimension.

     DimDate
         \
DimStore - FactSales - DimProduct
         /
     DimCustomer

Snowflake Schema

A snowflake schema is a star schema in which the dimensions themselves are normalized. DimProduct might split into DimProduct, DimSubcategory, and DimCategory. Storage is slightly smaller and updates to category names happen in one place, but queries need more joins, so star is the default choice in most Microsoft references.

Fact and Dimension Vocabulary

Fact — a numeric, additive event (revenue, units, duration).
Dimension — a descriptive attribute (date, product name, region).
Grain — the level of detail in a fact table (one row per order line, one row per page view).
Surrogate key — a synthetic integer key in a dimension, independent of the source system's natural key.

Understanding fact/dimension/grain vocabulary is enough for DP-900 — deeper dimensional modeling is covered in DP-700.

Test Your Knowledge

A retail analytics team lands raw POS files in a Microsoft Fabric lakehouse as Delta tables, then cleans and deduplicates them in a second zone, and finally publishes aggregated sales-by-region tables that Power BI consumes directly. Which architectural pattern are they following?

Two-phase commit

Medallion architecture (bronze, silver, gold)

Snowflake schema with surrogate keys

Lambda architecture with speed and batch layers

Test Your Knowledge

Why is ELT generally preferred over ETL for modern Azure analytics workloads?

ELT requires no source extraction, reducing network egress costs.

ELT skips the transform step entirely, leaving data raw for downstream consumers.

ELT pushes transformations into elastic cloud compute (T-SQL or Spark) and uses cheap object storage as the staging area.

ELT is the only pattern that supports incremental loads, while ETL only supports full loads.

Up Next

2.5 Batch vs Stream Processing

Continue learning

Microsoft Azure Data Fundamentals

Azure DP-900

2.4 Analytical Workloads (OLAP) and Architectures

Key Takeaways

OLAP Characteristics

Modern Data Warehouse Architecture

Data Lake

Lakehouse

Medallion Architecture

ETL vs ELT

Dimensional Modeling: Star and Snowflake Schemas

Star Schema

Snowflake Schema

Fact and Dimension Vocabulary

Microsoft Azure Data Fundamentals

1Chapter 1: Introduction & Exam Overview

2Chapter 2: Core Data Concepts

3Chapter 3: Relational Data on Azure

4Chapter 4: Non-Relational Data on Azure

5Chapter 5: Analytics Workloads on Azure

Azure DP-900

2.4 Analytical Workloads (OLAP) and Architectures

Key Takeaways

OLAP Characteristics

Modern Data Warehouse Architecture

Data Lake

Lakehouse

Medallion Architecture

ETL vs ELT

Dimensional Modeling: Star and Snowflake Schemas

Star Schema

Snowflake Schema

Fact and Dimension Vocabulary