4.2 Azure Blob Storage
Key Takeaways
- Azure Blob storage holds unstructured object data in a flat namespace of containers, each containing blobs addressed by name.
- There are three blob types: block blobs (files and media, the default), append blobs (logging), and page blobs (random-access VM disks up to 8 TB).
- Access tiers trade storage cost against access cost and latency: Hot (frequent), Cool (>=30 days), Cold (>=90 days), and Archive (offline, >=180 days, rehydration required).
- Lifecycle management policies automatically move blobs between tiers or delete them based on age or last-access time.
- Azure Data Lake Storage Gen2 is Blob storage with the hierarchical namespace enabled, adding real directories and POSIX-style ACLs for analytics.
Azure Blob Storage
Quick Answer: Azure Blob storage is Azure's massively scalable object store for unstructured data — images, video, backups, logs, and the raw files of a data lake. Data lives in containers (think top-level folders) as blobs (individual objects). You pick a blob type (block, append, or page) and an access tier (Hot, Cool, Cold, or Archive) to balance cost against retrieval speed.
"Blob" stands for Binary Large Object. Blob storage is schema-less — Azure does not look inside the object, which is what makes it the right home for anything without a fixed structure.
Containers and the Flat Namespace
A blob storage account organizes data into containers. By default the namespace is flat: a blob named 2026/01/sales.csv is a single object whose name happens to contain slashes — there is no real 2026 folder. Tools display this as a virtual folder tree, but the storage layer sees one name. (Enabling the hierarchical namespace, below, changes this.)
The Three Blob Types
| Blob type | Optimized for | Example |
|---|---|---|
| Block blob | Files and streaming media uploaded in blocks | Documents, images, video, data-lake files |
| Append blob | Append-only writes | Log files, audit trails, telemetry appends |
| Page blob | Frequent random reads/writes up to 8 TB | Backing disks for Azure VMs |
Block blob is the default and the one DP-900 cares about most. Append blobs only allow adding to the end. Page blobs back unmanaged VM disks.
Access Tiers
The access tier sets how cheaply data is stored versus how much it costs (and how long it takes) to read it back. This is the single most-tested blob topic.
| Tier | Storage cost | Access cost | Min. retention | Availability | Use case |
|---|---|---|---|---|---|
| Hot | Highest | Lowest | none | 99.9% | Frequently accessed, active data |
| Cool | Lower | Higher | 30 days | 99% | Infrequently accessed, short-term backup |
| Cold | Lower still | Higher still | 90 days | 99% | Rarely accessed but still online |
| Archive | Lowest | Highest | 180 days | Offline | Long-term retention, compliance |
Key exam traps:
- Hot, Cool, and Cold are online — data is readable immediately. Archive is offline: you must rehydrate a blob (move it to Hot or Cool) before reading it, which can take hours.
- Each cooler tier has an early-deletion penalty if you remove or move the data before its minimum-retention period (30 / 90 / 180 days).
- Hot, Cool, and Cold are set at the blob level (or account default). Archive is set per blob.
The decision rule: choose the warmest tier whose access pattern you actually have, because cooler tiers punish frequent reads and early deletion.
Lifecycle Management
Manually re-tiering blobs does not scale. A lifecycle management policy is a JSON rule set on the account that automatically acts on blobs based on age or last-access time — for example, move to Cool after 30 days, to Archive after 90 days, delete after 7 years. This is how organizations keep storage bills down without writing code.
Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 (ADLS Gen2) is not a separate service — it is Blob storage with the hierarchical namespace (HNS) feature switched on. HNS adds:
- Real directories that can be created, renamed, and deleted atomically (a flat blob store has to copy every object to rename a "folder").
- POSIX-style ACLs for fine-grained, folder-level permissions.
- A driver (
abfs://) optimized for analytics engines such as Spark, Synapse, and Databricks.
Because ADLS Gen2 sits on the same platform as Blob storage, it inherits the same tiers, redundancy, and security. For analytics workloads — the data lake at the heart of the medallion architecture — HNS is enabled; for pure object storage (media, backups), it is left off.
Common Blob Scenarios
- Static website hosting. A blob account can serve a static site directly from a
$webcontainer. - Backup and archive. Database and VM backups land in Cool or Archive tiers.
- Data lake landing zone. Raw ingestion files arrive as block blobs in ADLS Gen2, ready for the bronze layer.
- Media distribution. Images and video stream from Hot-tier block blobs, often behind a CDN.
Moving Data Into Blob Storage
DP-900 expects awareness of the common tools for getting data in and out. AzCopy is a command-line utility optimized for high-throughput bulk copies. Azure Storage Explorer is a free graphical client for browsing and transferring blobs. Azure Data Factory copy activities move data on a schedule. The REST API and SDKs (.NET, Python, Java, JavaScript) embed transfers in applications, and abfs:// is the analytics driver used by Spark and Synapse against ADLS Gen2. Recognizing AzCopy and Storage Explorer by name is a frequent exam point.
Tier Changes and Rehydration Mechanics
Changing a blob's tier is metadata-only for the online tiers (Hot, Cool, Cold) and takes effect quickly. Moving into Archive is also quick, but moving out of Archive (rehydration) is the slow path: you choose a standard priority (up to ~15 hours) or high priority (faster, costs more) and the blob is unreadable until rehydration completes. A common trap presents a need to read archived data "immediately" — the correct response is that Archive cannot serve immediate reads, so if low-latency access is required the data should not be in Archive in the first place.
Immutability, Soft Delete, and Versioning
Blob storage offers data-protection features the exam may touch on:
- Soft delete retains deleted blobs (and containers) for a configurable window so accidental deletes can be recovered.
- Versioning automatically keeps prior versions of a blob on each overwrite.
- Immutable storage (WORM) applies time-based or legal-hold policies so blobs cannot be modified or deleted until the policy expires — used for regulatory compliance such as financial records.
Putting Tiers and Lifecycle Together
A mature blob strategy combines tiers with a lifecycle policy: data lands in Hot while it is actively used, a policy moves it to Cool after 30 days of no access, then to Archive after 90 days, and finally deletes it after the legal retention period. The same policy can act on last-access time rather than creation time, so genuinely active data stays Hot while truly dormant data drifts cheaply downward — minimizing cost without manual intervention, which is the operational point the exam rewards.
A compliance team must retain audit files for seven years. The files will almost never be read, retrieval latency of several hours is acceptable, and storage cost must be minimized. Which Azure Blob access tier fits best?
What is the relationship between Azure Blob storage and Azure Data Lake Storage Gen2?