3.2 Discover Data: OneLake Catalog & Real-Time Hub

Key Takeaways

  • The OneLake catalog is the tenant-wide discovery surface for governed analytical items (lakehouses, warehouses, semantic models, KQL databases) with search, lineage, and endorsement filtering.
  • The Real-Time hub is the discovery and management surface specifically for streaming and event data: eventstreams, KQL data, Azure event sources, and Fabric events.
  • Endorsement has two levels — Promoted (any contributor can apply) and Certified (only authorized reviewers) — and endorsed items surface higher and are visually flagged in discovery.
  • Discovering and reusing an existing certified semantic model or lakehouse prevents duplicate, ungoverned data sprawl and is the exam-preferred answer over rebuilding.
  • Sensitivity labels and endorsement are governance signals, not access control; they help users find trusted data but do not by themselves grant or deny permissions.
Last updated: May 2026

Discovery Before You Build

A recurring DP-600 theme: the correct engineering choice is often to find and reuse a governed asset rather than re-ingest the same data. The exam rewards answers that say "check the catalog / Real-Time hub first." Reusing a certified semantic model or an existing lakehouse table avoids duplicate storage, conflicting logic, and ungoverned shadow datasets.

OneLake Catalog

The OneLake catalog is the tenant-wide place to discover and govern analytical data items you have access to:

  • Discover tab — search and browse lakehouses, warehouses, semantic models, KQL databases, and other items across workspaces.
  • Govern experience — surfaces governance posture: sensitivity labels, endorsement, refresh status, and item insights to help data owners improve trust.
  • Lineage — shows upstream sources and downstream dependents so you understand impact before changing or reusing an item.
  • Filters — narrow by item type, workspace, endorsement, and sensitivity label.

Real-Time Hub

The Real-Time hub is the dedicated discovery and management surface for streaming and event data — the equivalent of the catalog for data in motion:

  • Browse and connect to eventstreams, KQL databases/eventhouse data, Fabric events, and Azure event sources (Event Hubs, IoT Hub, blob events).
  • Preview streams, create eventstreams, and route events to destinations such as an eventhouse or lakehouse.
  • It is where you go when the scenario mentions streaming, telemetry, IoT, or events and you need to find or wire up a real-time source.

Catalog vs. Real-Time Hub

NeedUse
Find an existing lakehouse, warehouse, or semantic model to reuseOneLake catalog
Inspect lineage and endorsement before changing a modelOneLake catalog
Discover available event streams or telemetry sourcesReal-Time hub
Connect to Azure Event Hubs / IoT Hub and route eventsReal-Time hub
Find a certified dataset for self-service reportingOneLake catalog (filter by Certified)

Endorsement: Surfacing Trusted Data

Endorsement is how organizations signal which items are trustworthy. It directly affects discovery ranking and the badges users see.

LevelWho can applyMeaningDiscovery effect
Nonen/aUnendorsed / personalLower visibility
PromotedAny item contributor/owner"This is ready to use"Promoted badge, surfaced higher
CertifiedOnly users authorized by the tenant/domain admin"Org-reviewed, authoritative"Certified badge, surfaced highest
Master data (where enabled)Authorized reviewersCanonical reference dataHighlighted as canonical

Key exam distinctions you must keep straight:

  • Endorsement is a trust signal, not access control. A certified semantic model still requires permissions to actually consume; certification does not grant access.
  • Promoted is self-service; Certified is gated. Anyone who can edit an item can promote it; only authorized reviewers can certify it. This restriction is what makes Certified meaningful.
  • Endorsement is not deployment and not a sensitivity label. Endorsement surfaces trust in discovery; deployment pipelines move content between stages; sensitivity labels classify and can drive protection — three separate concepts the exam likes to blur in distractors.

When a scenario asks how to make a governed dataset easy for analysts to find and trust, the answer is to endorse it (Certify it if review is required) so it ranks and badges appropriately in the OneLake catalog — not to copy it into every workspace.

Test Your Knowledge

A data analyst is about to build a new lakehouse and pipeline to calculate company-wide revenue. Before starting, what is the BEST first step according to Fabric data-governance practice?

A
B
C
D