2.4 Analytical Workloads (OLAP) and Architectures

Key Takeaways

  • OLAP workloads scan large historical datasets, return aggregated results, and use denormalized schemas optimized for read performance.
  • A modern data warehouse pattern combines a data lake for raw data with a structured warehouse for curated data, queried by Power BI.
  • The lakehouse architecture (Microsoft Fabric, Azure Databricks) unifies lake-style storage with warehouse-style SQL semantics on open formats like Delta and Parquet.
  • The medallion architecture organizes lake data into bronze (raw), silver (cleaned/conformed), and gold (business-ready aggregates) layers.
  • ETL transforms data before loading it into a warehouse; ELT loads first and transforms inside the destination using cloud compute, which is the dominant Azure pattern.
Last updated: June 2026

Online Analytical Processing (OLAP) is the workload that helps you understand the business over time: revenue by region by quarter, customer churn, sales attribution. It looks nothing like OLTP at the storage layer, even when the source data starts in an OLTP system.

OLAP Characteristics

AspectOLAP profile
Query shapeAggregations (SUM, COUNT, AVG) over many rows
Latency targetSeconds to minutes is acceptable
ConcurrencyTens to hundreds of analysts, not thousands
SchemaDenormalized — star or snowflake
Read/write mixMostly reads, periodic bulk loads
FreshnessHistorical; minutes to hours behind the source is usually fine

In Azure, OLAP runs on Azure Synapse Analytics dedicated SQL pools, Microsoft Fabric warehouses and lakehouses, Azure Databricks, and Azure Analysis Services.

Modern Data Warehouse Architecture

Microsoft's reference architecture for modern analytics has four stages:

Sources  →  Ingestion  →  Storage  →  Model/Serve  →  Consumption
  • Sources — operational databases, SaaS apps, files, streams.
  • Ingestion — Azure Data Factory pipelines, Synapse pipelines, Fabric Data Factory, Event Hubs for streams.
  • Storage — Azure Data Lake Storage Gen2 for raw and semi-curated data, plus a warehouse for curated, modeled data.
  • Model/Serve — star-schema models in Synapse, Fabric, or Analysis Services; semantic models in Power BI.
  • Consumption — Power BI dashboards, ad-hoc SQL, ML training pipelines.

Data Lake

A data lake is a low-cost, schema-on-read store that holds data in its native format — CSV, JSON, Parquet, Avro, images, audio. Azure Data Lake Storage Gen2 sits on top of Azure Blob Storage and adds a hierarchical namespace and POSIX-style ACLs, which makes lake workloads efficient.

The lake is the landing zone. Raw data lives here cheaply, and you only pay to refine it when there is a known business need.

Lakehouse

A lakehouse unifies the cheap, flexible storage of a data lake with the SQL semantics and reliability of a data warehouse. It is built on open table formats — Delta Lake is the dominant one in Microsoft Fabric and Azure Databricks — which add ACID transactions, time travel, and schema enforcement on top of Parquet files in the lake.

A Microsoft Fabric lakehouse stores data as Delta tables in OneLake and exposes both a SQL endpoint (for warehouse-style queries) and a Spark endpoint (for engineering and data science). One copy of data, many engines.

Medallion Architecture

Microsoft documentation, Fabric, and Databricks all use the medallion architecture to organize lakehouse data into three quality tiers.

LayerAlso calledContentsTypical consumer
BronzeRawExact copy of source data, including duplicates and bad rowsData engineers, audit
SilverCleansed, conformedDeduplicated, type-cast, validated; often joined across sourcesData engineers, advanced analysts
GoldCurated, business-readyAggregates and dimensional models used by reportsAnalysts, Power BI, business users

The medallion pattern lets you reprocess from any layer if business rules change downstream, because the raw bronze copy is always retained.

ETL vs ELT

Both patterns move data from sources to a destination; they differ in where the transform happens.

  • ETL — Extract, Transform, Load. Pull data from sources, transform it in a dedicated engine (such as an SSIS server or Data Factory mapping data flows), then load the result. This was standard when warehouse storage was expensive.
  • ELT — Extract, Load, Transform. Pull data from sources, land it raw in the lake or warehouse, then transform inside the destination using its native engine (T-SQL, Spark, Fabric notebooks). Cloud compute is elastic and storage is cheap, so ELT is the dominant Azure pattern today.

Azure Data Factory and Fabric Data Factory both support either pattern; the choice depends on whether transforms run in mapping data flows (ETL) or as T-SQL/Spark inside the warehouse (ELT).

Dimensional Modeling: Star and Snowflake Schemas

OLAP storage is denormalized. The two canonical models are the star schema and the snowflake schema.

Star Schema

A single central fact table (sales, page views, claims) holds measurable events at a chosen grain. It is surrounded by dimension tables (date, product, customer, store) that hold descriptive attributes. Joins go one level deep — the fact joins directly to each dimension.

     DimDate
         \
DimStore - FactSales - DimProduct
         /
     DimCustomer

Snowflake Schema

A snowflake schema is a star schema in which the dimensions themselves are normalized. DimProduct might split into DimProduct, DimSubcategory, and DimCategory. Storage is slightly smaller and updates to category names happen in one place, but queries need more joins, so star is the default choice in most Microsoft references.

Fact and Dimension Vocabulary

  • Fact — a numeric, additive event (revenue, units, duration).
  • Dimension — a descriptive attribute (date, product name, region).
  • Grain — the level of detail in a fact table (one row per order line, one row per page view).
  • Surrogate key — a synthetic integer key in a dimension, independent of the source system's natural key.

Understanding fact/dimension/grain vocabulary is enough for DP-900 — deeper dimensional modeling is covered in DP-700.

Test Your Knowledge

A retail analytics team lands raw POS files in a Microsoft Fabric lakehouse as Delta tables, then cleans and deduplicates them in a second zone, and finally publishes aggregated sales-by-region tables that Power BI consumes directly. Which architectural pattern are they following?

A
B
C
D
Test Your Knowledge

Why is ELT generally preferred over ETL for modern Azure analytics workloads?

A
B
C
D