2.1 Data Roles and Workloads

Key Takeaways

  • Database administrators (DBAs) own availability, backups, security, and performance of a database system, not the design of analytical models.
  • Data engineers build and operate data pipelines, ingestion, transformation, and storage layers that feed analysts and scientists.
  • Data analysts produce reports and visualizations from cleaned data, typically using Power BI, SQL, and DAX.
  • Data scientists apply statistics and machine learning to build predictive and prescriptive models, often working in Azure Machine Learning or notebooks.
  • OLTP workloads process many small read/write transactions in near real time; OLAP workloads scan large historical datasets to support analysis and reporting.
Last updated: June 2026

Microsoft Learn frames the DP-900 exam around three ideas: the data, the people who work with it, and the workload that uses it. Before you can match an Azure service to a scenario, you have to recognize who is doing what.

The Four Core Data Roles

Microsoft groups data professionals into four reference roles. Each role has its own typical tasks, tools, and Microsoft certifications.

RolePrimary responsibilityTypical Azure toolsMicrosoft cert path
Database Administrator (DBA)Keep databases available, secure, performant, and recoverableAzure SQL, SQL Server on Azure VMs, SQL Managed InstanceDP-300
Data EngineerBuild pipelines that ingest, transform, and store data at scaleAzure Data Factory, Synapse Analytics, Microsoft Fabric, DatabricksDP-700
Data AnalystModel and visualize cleaned data so the business can decidePower BI, Power BI Service, DAX, SQLPL-300
Data ScientistTrain and operationalize statistical and machine-learning modelsAzure Machine Learning, notebooks, Python/RDP-100

Database Administrator

The DBA is responsible for the runtime health of a database. Day-to-day work includes installing and patching database engines, creating logins and role assignments, configuring backups and point-in-time restore, tuning indexes, watching for blocking and deadlocks, and meeting SLAs. On Azure, a DBA spends much less time on hardware or OS patching because services such as Azure SQL Database and Azure SQL Managed Instance are platform as a service (PaaS) — Microsoft owns the underlying infrastructure.

Data Engineer

The data engineer builds the plumbing. They design ingestion from operational sources, write transformation logic (T-SQL, PySpark, dataflows), and land data in formats that analysts and scientists can query. In Azure, this work usually happens in Azure Data Factory for orchestration, Azure Synapse Analytics or Microsoft Fabric for storage and compute, and Azure Databricks for large-scale Spark transformations.

Data Analyst

The data analyst turns curated data into answers. They author semantic models, write DAX measures, design dashboards, and tell stories with visuals. In the Microsoft stack, the analyst's home is Power BI Desktop for authoring and the Power BI Service (or Microsoft Fabric) for sharing.

Data Scientist

The data scientist applies statistics and machine learning to predict and prescribe. They explore data in notebooks, engineer features, train models, evaluate accuracy, and deploy models as endpoints. On Azure this lives in Azure Machine Learning and increasingly in Fabric Data Science.

Workload Types: OLTP vs OLAP

A workload is the kind of work a data system is doing. DP-900 cares about two main categories.

CharacteristicOLTP (Transactional)OLAP (Analytical)
PurposeRun the business in real timeUnderstand the business over time
TransactionsMany small reads and writesFew large reads, mostly aggregations
Latency targetMillisecondsSeconds to minutes
SchemaNormalizedDenormalized, often star schema
Typical userApplicationAnalyst, data scientist, executive
Azure exampleAzure SQL Database, Cosmos DBSynapse dedicated SQL pool, Fabric warehouse

A point-of-sale system writing one sales row per scan is OLTP. The end-of-quarter report that scans 500 million sales rows to compute revenue by region is OLAP.

Batch vs Streaming Ingestion

Independently of OLTP vs OLAP, the way data arrives at a system is either batch or streaming.

  • Batch processing moves data in large groups on a schedule (every hour, every night). It is latency-tolerant but throughput-friendly. The classic Azure example is an ADF pipeline that copies the previous day's sales files into the warehouse at 2 AM.
  • Stream processing treats each event as it arrives, often within seconds. Stock ticks, IoT telemetry, and clickstreams are streaming. Azure offers Azure Stream Analytics, Event Hubs, and Microsoft Fabric Real-Time Intelligence for these patterns.

Many real systems combine both: streaming for the live dashboard, batch for the historical warehouse load. Recognizing which side of the line a scenario sits on is a recurring DP-900 question pattern.

How the Roles Hand Off to Each Other

The four roles form a pipeline, and DP-900 often tests the boundary between them. Raw operational data is owned by the DBA, who guarantees it is available and correct. The data engineer picks that data up, moves it into the analytics estate, and shapes it into trusted tables. The data analyst consumes those trusted tables to build reports, and the data scientist consumes them to train models.

A common trap question describes a task that sits on a boundary — "who designs the nightly pipeline?" (engineer, not DBA) or "who builds the Power BI semantic model?" (analyst, not engineer). Match the verb: administer and protect (DBA), ingest and transform (engineer), model and visualize (analyst), predict and prescribe (scientist).

The Five Analytics Maturity Levels

Microsoft Learn frames analytical value along a maturity ladder, and you should be able to order it. Each level answers a different question and is increasingly valuable and complex.

LevelQuestion it answersExample
DescriptiveWhat happened?Last quarter's total sales
DiagnosticWhy did it happen?Sales fell because a region underperformed
PredictiveWhat will happen?Forecast next quarter's demand
PrescriptiveWhat should we do?Recommend reorder quantities
CognitiveLearn and adapt automaticallyA system that retrains itself on new data

Descriptive and diagnostic analytics are typically the analyst's territory; predictive, prescriptive, and cognitive analytics are the data scientist's. A question that asks "which type of analytics recommends an action?" is testing prescriptive; "forecasts a future value?" is predictive.

Where the Workloads Run

Mapping the role-and-workload framing onto Azure: OLTP runs on the Azure SQL family and Cosmos DB; OLAP runs on Synapse, Microsoft Fabric, and Databricks; batch ingestion is orchestrated by Data Factory; streaming flows through Event Hubs, IoT Hub, and Stream Analytics; and the analyst's reporting layer is Power BI. Recognizing both who does the work and which service hosts it is the combined skill the core-concepts domain rewards, because nearly every later exam question is a variation on "this person, doing this workload, should use which Azure service?"

Test Your Knowledge

A retail company needs someone to design ingestion pipelines that move POS data from 1,200 stores into a central analytics platform every night, transforming the data along the way. Which data role best fits this responsibility?

A
B
C
D
Test Your Knowledge

Which pair of characteristics most clearly identifies an OLTP workload rather than an OLAP workload?

A
B
C
D