3.2 Discover Data: OneLake Catalog & Real-Time Hub
Key Takeaways
- The OneLake catalog is the tenant-wide discovery surface for governed analytical items (lakehouses, warehouses, semantic models, KQL databases) with search, lineage, and endorsement filtering.
- The Real-Time hub is the discovery and management surface specifically for streaming and event data: eventstreams, KQL data, Azure event sources, and Fabric events.
- Endorsement has two levels — Promoted (any contributor can apply) and Certified (only authorized reviewers) — and endorsed items surface higher and are visually flagged in discovery.
- Discovering and reusing an existing certified semantic model or lakehouse prevents duplicate, ungoverned data sprawl and is the exam-preferred answer over rebuilding.
- Sensitivity labels and endorsement are governance signals, not access control; they help users find trusted data but do not by themselves grant or deny permissions.
Discovery Before You Build
A recurring DP-600 theme: the correct engineering choice is often to find and reuse a governed asset rather than re-ingest the same data. The exam rewards answers that say "check the catalog / Real-Time hub first." Reusing a certified semantic model or an existing lakehouse table avoids duplicate storage, conflicting logic, and ungoverned shadow datasets.
OneLake Catalog
The OneLake catalog is the tenant-wide place to discover and govern analytical data items you have access to:
- Discover tab — search and browse lakehouses, warehouses, semantic models, KQL databases, and other items across workspaces.
- Govern experience — surfaces governance posture: sensitivity labels, endorsement, refresh status, and item insights to help data owners improve trust.
- Lineage — shows upstream sources and downstream dependents so you understand impact before changing or reusing an item.
- Filters — narrow by item type, workspace, endorsement, and sensitivity label.
Real-Time Hub
The Real-Time hub is the dedicated discovery and management surface for streaming and event data — the equivalent of the catalog for data in motion:
- Browse and connect to eventstreams, KQL databases/eventhouse data, Fabric events, and Azure event sources (Event Hubs, IoT Hub, blob events).
- Preview streams, create eventstreams, and route events to destinations such as an eventhouse or lakehouse.
- It is where you go when the scenario mentions streaming, telemetry, IoT, or events and you need to find or wire up a real-time source.
Catalog vs. Real-Time Hub
| Need | Use |
|---|---|
| Find an existing lakehouse, warehouse, or semantic model to reuse | OneLake catalog |
| Inspect lineage and endorsement before changing a model | OneLake catalog |
| Discover available event streams or telemetry sources | Real-Time hub |
| Connect to Azure Event Hubs / IoT Hub and route events | Real-Time hub |
| Find a certified dataset for self-service reporting | OneLake catalog (filter by Certified) |
Endorsement: Surfacing Trusted Data
Endorsement is how organizations signal which items are trustworthy. It directly affects discovery ranking and the badges users see.
| Level | Who can apply | Meaning | Discovery effect |
|---|---|---|---|
| None | n/a | Unendorsed / personal | Lower visibility |
| Promoted | Any item contributor/owner | "This is ready to use" | Promoted badge, surfaced higher |
| Certified | Only users authorized by the tenant/domain admin | "Org-reviewed, authoritative" | Certified badge, surfaced highest |
| Master data (where enabled) | Authorized reviewers | Canonical reference data | Highlighted as canonical |
Key exam distinctions you must keep straight:
- Endorsement is a trust signal, not access control. A certified semantic model still requires permissions to actually consume; certification does not grant access.
- Promoted is self-service; Certified is gated. Anyone who can edit an item can promote it; only authorized reviewers can certify it. This restriction is what makes Certified meaningful.
- Endorsement is not deployment and not a sensitivity label. Endorsement surfaces trust in discovery; deployment pipelines move content between stages; sensitivity labels classify and can drive protection — three separate concepts the exam likes to blur in distractors.
When a scenario asks how to make a governed dataset easy for analysts to find and trust, the answer is to endorse it (Certify it if review is required) so it ranks and badges appropriately in the OneLake catalog — not to copy it into every workspace.
A data analyst is about to build a new lakehouse and pipeline to calculate company-wide revenue. Before starting, what is the BEST first step according to Fabric data-governance practice?