2.1 Data Roles and Workloads
Key Takeaways
- Database administrators (DBAs) own availability, backups, security, and performance of a database system, not the design of analytical models.
- Data engineers build and operate data pipelines, ingestion, transformation, and storage layers that feed analysts and scientists.
- Data analysts produce reports and visualizations from cleaned data, typically using Power BI, SQL, and DAX.
- Data scientists apply statistics and machine learning to build predictive and prescriptive models, often working in Azure Machine Learning or notebooks.
- OLTP workloads process many small read/write transactions in near real time; OLAP workloads scan large historical datasets to support analysis and reporting.
Microsoft Learn frames the DP-900 exam around three ideas: the data, the people who work with it, and the workload that uses it. Before you can match an Azure service to a scenario, you have to recognize who is doing what.
The Four Core Data Roles
Microsoft groups data professionals into four reference roles. Each role has its own typical tasks, tools, and Microsoft certifications.
| Role | Primary responsibility | Typical Azure tools | Microsoft cert path |
|---|---|---|---|
| Database Administrator (DBA) | Keep databases available, secure, performant, and recoverable | Azure SQL, SQL Server on Azure VMs, SQL Managed Instance | DP-300 |
| Data Engineer | Build pipelines that ingest, transform, and store data at scale | Azure Data Factory, Synapse Analytics, Microsoft Fabric, Databricks | DP-700 |
| Data Analyst | Model and visualize cleaned data so the business can decide | Power BI, Power BI Service, DAX, SQL | PL-300 |
| Data Scientist | Train and operationalize statistical and machine-learning models | Azure Machine Learning, notebooks, Python/R | DP-100 |
Database Administrator
The DBA is responsible for the runtime health of a database. Day-to-day work includes installing and patching database engines, creating logins and role assignments, configuring backups and point-in-time restore, tuning indexes, watching for blocking and deadlocks, and meeting SLAs. On Azure, a DBA spends much less time on hardware or OS patching because services such as Azure SQL Database and Azure SQL Managed Instance are platform as a service (PaaS) — Microsoft owns the underlying infrastructure.
Data Engineer
The data engineer builds the plumbing. They design ingestion from operational sources, write transformation logic (T-SQL, PySpark, dataflows), and land data in formats that analysts and scientists can query. In Azure, this work usually happens in Azure Data Factory for orchestration, Azure Synapse Analytics or Microsoft Fabric for storage and compute, and Azure Databricks for large-scale Spark transformations.
Data Analyst
The data analyst turns curated data into answers. They author semantic models, write DAX measures, design dashboards, and tell stories with visuals. In the Microsoft stack, the analyst's home is Power BI Desktop for authoring and the Power BI Service (or Microsoft Fabric) for sharing.
Data Scientist
The data scientist applies statistics and machine learning to predict and prescribe. They explore data in notebooks, engineer features, train models, evaluate accuracy, and deploy models as endpoints. On Azure this lives in Azure Machine Learning and increasingly in Fabric Data Science.
Workload Types: OLTP vs OLAP
A workload is the kind of work a data system is doing. DP-900 cares about two main categories.
| Characteristic | OLTP (Transactional) | OLAP (Analytical) |
|---|---|---|
| Purpose | Run the business in real time | Understand the business over time |
| Transactions | Many small reads and writes | Few large reads, mostly aggregations |
| Latency target | Milliseconds | Seconds to minutes |
| Schema | Normalized | Denormalized, often star schema |
| Typical user | Application | Analyst, data scientist, executive |
| Azure example | Azure SQL Database, Cosmos DB | Synapse dedicated SQL pool, Fabric warehouse |
A point-of-sale system writing one sales row per scan is OLTP. The end-of-quarter report that scans 500 million sales rows to compute revenue by region is OLAP.
Batch vs Streaming Ingestion
Independently of OLTP vs OLAP, the way data arrives at a system is either batch or streaming.
- Batch processing moves data in large groups on a schedule (every hour, every night). It is latency-tolerant but throughput-friendly. The classic Azure example is an ADF pipeline that copies the previous day's sales files into the warehouse at 2 AM.
- Stream processing treats each event as it arrives, often within seconds. Stock ticks, IoT telemetry, and clickstreams are streaming. Azure offers Azure Stream Analytics, Event Hubs, and Microsoft Fabric Real-Time Intelligence for these patterns.
Many real systems combine both: streaming for the live dashboard, batch for the historical warehouse load. Recognizing which side of the line a scenario sits on is a recurring DP-900 question pattern.
How the Roles Hand Off to Each Other
The four roles form a pipeline, and DP-900 often tests the boundary between them. Raw operational data is owned by the DBA, who guarantees it is available and correct. The data engineer picks that data up, moves it into the analytics estate, and shapes it into trusted tables. The data analyst consumes those trusted tables to build reports, and the data scientist consumes them to train models.
A common trap question describes a task that sits on a boundary — "who designs the nightly pipeline?" (engineer, not DBA) or "who builds the Power BI semantic model?" (analyst, not engineer). Match the verb: administer and protect (DBA), ingest and transform (engineer), model and visualize (analyst), predict and prescribe (scientist).
The Five Analytics Maturity Levels
Microsoft Learn frames analytical value along a maturity ladder, and you should be able to order it. Each level answers a different question and is increasingly valuable and complex.
| Level | Question it answers | Example |
|---|---|---|
| Descriptive | What happened? | Last quarter's total sales |
| Diagnostic | Why did it happen? | Sales fell because a region underperformed |
| Predictive | What will happen? | Forecast next quarter's demand |
| Prescriptive | What should we do? | Recommend reorder quantities |
| Cognitive | Learn and adapt automatically | A system that retrains itself on new data |
Descriptive and diagnostic analytics are typically the analyst's territory; predictive, prescriptive, and cognitive analytics are the data scientist's. A question that asks "which type of analytics recommends an action?" is testing prescriptive; "forecasts a future value?" is predictive.
Where the Workloads Run
Mapping the role-and-workload framing onto Azure: OLTP runs on the Azure SQL family and Cosmos DB; OLAP runs on Synapse, Microsoft Fabric, and Databricks; batch ingestion is orchestrated by Data Factory; streaming flows through Event Hubs, IoT Hub, and Stream Analytics; and the analyst's reporting layer is Power BI. Recognizing both who does the work and which service hosts it is the combined skill the core-concepts domain rewards, because nearly every later exam question is a variation on "this person, doing this workload, should use which Azure service?"
A retail company needs someone to design ingestion pipelines that move POS data from 1,200 stores into a central analytics platform every night, transforming the data along the way. Which data role best fits this responsibility?
Which pair of characteristics most clearly identifies an OLTP workload rather than an OLAP workload?