2.2 Relational vs Non-Relational Data

Key Takeaways

  • Relational databases store data in tables of rows and columns with a fixed schema and enforce ACID transactions.
  • Normalization removes redundancy by splitting data into related tables linked by primary and foreign keys.
  • Non-relational (NoSQL) stores include four main families: key-value, document, column-family, and graph.
  • Structured data fits a fixed schema, semi-structured data carries its own schema in tags (JSON, XML, Avro), and unstructured data has no inherent schema (images, video, audio).
  • Azure Cosmos DB exposes multiple non-relational APIs (NoSQL, MongoDB, Cassandra, Gremlin, Table) under one global service.
Last updated: June 2026

Microsoft's reference framework divides data stores into relational and non-relational, then layers a separate axis for structured / semi-structured / unstructured data. DP-900 questions blend the two axes, so it pays to understand both.

Relational Data

A relational database organizes data into tables of rows (records) and columns (attributes). Every row in a table has the same columns, and every column has a defined data type (for example INT, VARCHAR(50), DATETIME2). Relationships between tables are expressed with primary keys (uniquely identify each row) and foreign keys (reference a primary key in another table).

Why a Schema Helps

The schema is a contract: an Orders table will always have a CustomerId column of type INT, and that value will always exist in Customers.CustomerId. The database engine enforces this contract on every insert and update. The trade-off is that changing the schema is heavier than in NoSQL — you typically ship a migration script.

Normalization

Normalization is the process of splitting data into related tables so each fact is stored exactly once. The classic example is moving customer details out of an Orders table into a separate Customers table and linking them with CustomerId. This eliminates update anomalies (changing the customer's address in one row but not another) and shrinks storage, at the cost of more JOINs at query time.

The forms most quoted on DP-900 are:

  • 1NF — each column holds a single value, no repeating groups.
  • 2NF — every non-key column depends on the entire primary key.
  • 3NF — non-key columns depend only on the key, not on other non-key columns.

OLTP databases are usually normalized to 3NF; OLAP warehouses are intentionally denormalized for query speed (see section 2.4).

ACID Transactions

Relational engines guarantee transactions are Atomic, Consistent, Isolated, and Durable (covered in depth in section 2.3). This is why relational stores remain the default for systems where data correctness matters more than raw throughput — banking, healthcare, inventory.

Non-Relational (NoSQL) Data

Non-relational databases relax the fixed-schema requirement to gain flexibility, horizontal scale, or specialized access patterns. Microsoft groups them into four families.

NoSQL familyShapeBest forAzure example
Key-valueOpaque value stored against a keyCaches, session state, feature flagsAzure Cache for Redis, Cosmos DB for Table
DocumentJSON/BSON documents grouped in collectionsApp data with flexible attributes per itemAzure Cosmos DB for NoSQL, Cosmos DB for MongoDB
Column-familyRows grouped by row key, columns grouped into familiesTime-series, IoT telemetry, very wide rowsAzure Cosmos DB for Apache Cassandra, HBase on HDInsight
GraphVertices and edges representing entities and relationshipsSocial networks, fraud detection, recommendation graphsAzure Cosmos DB for Apache Gremlin

Key-Value Stores

The simplest model. You hand the database a key, you get back a value. Reads and writes are typically microseconds because there are no joins, no schema checks, and the value is treated as opaque to the engine. Azure Cache for Redis is the canonical Azure example.

Document Stores

Each item is a self-contained JSON document with its own attributes. Two documents in the same collection can have entirely different fields, which suits applications where product attributes vary by category. Azure Cosmos DB for NoSQL is Microsoft's first-party document database; it also offers a wire-compatible MongoDB API.

Column-Family Stores

Data is grouped into rows, but each row stores columns in named column families. Storage is sparse — a row only stores columns it has. This is efficient for very wide datasets where most cells would be null in a relational table. Time-series and IoT telemetry are typical workloads. Cosmos DB for Apache Cassandra is the Azure offering.

Graph Stores

A graph database represents data as vertices (nodes) and edges (relationships). The engine is optimized for traversing relationships — "find all friends of friends who bought product X" — which would require many joins in a relational design. Azure Cosmos DB for Apache Gremlin is the Azure option.

Structured, Semi-Structured, and Unstructured Data

This is a separate axis. It describes the shape of the data, not the engine.

CategoryHas schema?Examples
StructuredYes, fixed and external (defined in the database)Tables in Azure SQL Database, dimensional models in Fabric
Semi-structuredYes, but embedded in the data itself (tags, keys)JSON, XML, YAML, Avro, Parquet schemas
UnstructuredNo usable schema for queryImages, video, audio, free-text documents, PDFs

A common DP-900 trap: JSON and XML are semi-structured, not unstructured. They carry their own schema through tags, even though the schema can change document to document.

Unstructured data still has to live somewhere. In Azure, unstructured files typically land in Azure Blob Storage, while semi-structured data may live in Cosmos DB or as Parquet/Delta files in Azure Data Lake Storage Gen2.

Picking the Right Store

Microsoft's recommended decision pattern, condensed:

  1. Does the data have a fixed schema and need ACID transactions? → Relational (Azure SQL family).
  2. Does each record have flexible attributes that vary item to item? → Document (Cosmos DB for NoSQL).
  3. Is the dominant access pattern a single key lookup at very low latency? → Key-value (Azure Cache for Redis).
  4. Are relationships the primary thing you query? → Graph (Cosmos DB for Gremlin).
  5. Is the data binary or free-text without a usable schema? → Blob storage or Data Lake Storage Gen2.
Test Your Knowledge

An online catalog stores roughly 50 million products. Each product category (electronics, clothing, books) has a different set of attributes, and new categories are added without code changes. Which type of data store is the best fit?

A
B
C
D
Test Your Knowledge

Which statement most accurately distinguishes semi-structured data from unstructured data?

A
B
C
D