4.4 Azure Cosmos DB: Use Cases and Architecture
Key Takeaways
- Azure Cosmos DB is a fully managed, multi-model, globally distributed NoSQL database offering single-digit-millisecond latency and 99.999% availability with multi-region writes.
- Turnkey global distribution lets you add or remove read/write regions with a click; data is replicated automatically and clients are routed to the nearest region.
- Throughput is provisioned (or autoscaled) in Request Units per second (RU/s); a Request Unit is a normalized currency for the CPU, memory, and I/O a database operation costs.
- Data is sharded across logical partitions by a partition key; choosing a high-cardinality, evenly accessed partition key is critical to avoid hot partitions.
- Cosmos DB is ideal for global, write-heavy, low-latency applications such as IoT telemetry, retail catalogs, gaming, and personalization, but it is not a replacement for relational OLTP needing complex joins.
Azure Cosmos DB: Use Cases and Architecture
Quick Answer: Azure Cosmos DB is a fully managed, multi-model, globally distributed NoSQL database. It guarantees single-digit-millisecond read/write latency, up to 99.999% availability with multi-region writes, elastic scale, and five tunable consistency levels. You provision performance in Request Units per second (RU/s) and scale by partitioning data on a chosen key.
Where Azure Table storage is a simple, cheap key-value store, Cosmos DB is the premium, planet-scale NoSQL platform. On DP-900 you must be able to identify use cases for Cosmos DB and describe its APIs (the next section).
What Makes Cosmos DB Different
- Global distribution, turnkey. Add or remove Azure regions to a Cosmos DB account with a single setting. Data is replicated to every chosen region automatically, and the SDK routes each client to the nearest region for low latency.
- Multi-region writes. Optionally make every region a write region (active-active), so users on different continents all write locally. This is what underpins the latency SLA for globally distributed apps.
- Guaranteed SLAs. Microsoft offers financially backed SLAs on latency (single-digit ms at the 99th percentile), throughput, consistency, and availability — rare among databases.
- Schema-agnostic and auto-indexed. Items have no enforced schema, and by default every property is indexed, so queries are fast without manual index design.
- Elastic scale. Storage and throughput scale independently and virtually without limit.
Request Units (RU/s)
Cosmos DB abstracts CPU, memory, and IOPS into a single normalized currency: the Request Unit (RU). Every operation — a read, a write, a query — costs a measurable number of RUs. A point read of a 1 KB item costs roughly 1 RU; writes and queries cost more.
You provision throughput in RU/s, and there are three modes:
| Mode | How it works | Best for |
|---|---|---|
| Provisioned (manual) | You set a fixed RU/s; billed for that capacity | Steady, predictable workloads |
| Autoscale | You set a max RU/s; Cosmos scales between 10% and 100% of it automatically | Variable or spiky workloads |
| Serverless | Pay per RU consumed, no minimum | Dev/test, intermittent, low-traffic apps |
Exceeding your provisioned RU/s causes requests to be throttled (HTTP 429), which the SDK retries. Sizing RU/s correctly is a core operational task.
Partitioning
Cosmos DB scales horizontally by spreading data across many physical partitions. You choose a partition key (for example /deviceId, /userId, or /category), and Cosmos DB hashes it to assign each item to a logical partition; many logical partitions map onto each physical partition.
The partition key choice is the most important design decision:
- A good partition key has high cardinality (many distinct values) and spreads both storage and request volume evenly.
- A bad key concentrates traffic on a few values, creating a hot partition that throttles while the rest of the database sits idle.
- A logical partition has a 20 GB storage ceiling, so the key must also avoid unbounded growth on any single value.
When Cosmos DB Is the Right Answer
| Use case | Why Cosmos DB fits |
|---|---|
| IoT and telemetry | Massive write throughput, time-series friendly, elastic scale |
| Retail product catalog | Flexible schema per product category, global low-latency reads |
| Gaming | Single-digit-ms latency for leaderboards and player state worldwide |
| Personalization / user profiles | Globally distributed reads close to each user |
| Web and mobile backends | Active-active writes, auto-indexing, elastic scale during launches |
When It Is Not the Right Answer
- Workloads needing complex multi-table joins, foreign keys, and full ACID across many tables → use the Azure SQL family.
- Heavy ad-hoc analytical scans of historical data → use a warehouse or lakehouse (Synapse / Fabric). For analytics on Cosmos data without ETL, enable the analytical store and query it via Synapse Link / Fabric mirroring.
- Simple, low-cost key-value needs without global reach → plain Table storage may be cheaper.
The one-line exam tell: a scenario that stresses global distribution, millisecond latency at scale, flexible schema, or massive write throughput points to Cosmos DB.
Resource Hierarchy
Cosmos DB organizes data in a clear hierarchy you should recognize: an account (the globally distributed top-level resource, tied to one API) contains databases, which contain containers (called collections, tables, or graphs depending on the API), which hold items (documents, rows, nodes/edges). Throughput (RU/s) can be provisioned at the database level (shared across its containers) or at the container level (dedicated). Dedicated container throughput gives predictable performance; shared database throughput is cheaper for many small containers.
The Analytical Store and Synapse Link
Cosmos DB is an operational (OLTP-style) store, and running heavy analytical scans against it would consume RUs and slow the application. The analytical store solves this: when enabled on a container, Cosmos DB automatically keeps a column-oriented copy of the data, isolated from the transactional workload, with no RU cost to the operational side.
Azure Synapse Link (and Microsoft Fabric mirroring) then queries that analytical store directly — HTAP without ETL. The classic exam scenario "run analytics on Cosmos DB data without affecting transactional performance" is answered by the analytical store via Synapse Link or Fabric mirroring.
Backups and Security
Cosmos DB takes automatic backups and supports both periodic and continuous backup (point-in-time restore). Security layers include Microsoft Entra ID RBAC, primary/secondary keys, resource tokens for fine-grained access, IP firewalls, private endpoints, and always-on encryption at rest and in transit. These mirror the storage-account security model and reinforce that Entra ID is the preferred, secret-free access method across Azure data services.
RU Sizing Intuition
A rough mental model the exam rewards: a 1 KB point read costs ~1 RU; a 1 KB write costs roughly 5 RU; queries cost more depending on how many items they scan and whether indexes are used. If an app does 100 reads and 20 writes per second on 1 KB items, a back-of-envelope estimate is about 100 + (20 x 5) = 200 RU/s, before query overhead. You do not compute exact RU charges on DP-900, but you should understand that writes cost more than reads, that unindexed or large queries cost the most, and that exceeding provisioned RU/s causes throttling (HTTP 429) which autoscale or higher provisioning relieves.
A gaming company is launching worldwide and needs player profile and leaderboard data that can be read AND written with single-digit-millisecond latency from any continent, with each region able to accept writes locally. Which Azure service best fits?
A team designing a Cosmos DB container for IoT telemetry from 5 million devices wants to avoid throttling and hot partitions. Which partition key choice is best?