Which agreement governs commitments between two internal teams within the same data centre provider, rather than with the customer or an outside supplier?

Operational Level Agreement (OLA). An Operational Level Agreement is internal, between teams inside the provider, and exists to support delivery of the customer-facing SLA. The SLA is the provider-to-customer contract, the underpinning contract is with an external third-party supplier, and a MOP is an operational procedure rather than a service agreement.

A team replaces a UPS fan only when online battery-impedance and thermal-imaging data show it is beginning to degrade. Which maintenance strategy is this?

Predictive (condition-based) maintenance. Predictive maintenance times the work from monitored condition data such as vibration, thermal imaging, or impedance trends, intervening just before failure. Preventive maintenance would replace on a fixed schedule, corrective maintenance repairs after a fault, and run-to-failure deliberately waits for breakdown.

Which document provides step-by-step instructions for a specific change, including prerequisites, risk assessment, and a back-out plan?

MOP (Method of Procedure). A Method of Procedure documents exactly how to perform one specific scheduled activity, such as a UPS battery swap, including prerequisites, risk assessment, and abort/back-out criteria. SOPs cover routine repeatable tasks, EOPs cover emergency response, and an SLA is a service contract, not a work procedure.

Data Centre Operations — Free Study Guide 2026

Operational Procedures: SOP, MOP, EOP

Disciplined operations depend on written, version-controlled procedures. The three core document types are examined repeatedly, so master the distinction:

Document	Full name	Purpose
SOP	Standard Operating Procedure	Routine, repeatable operations: daily walk-throughs, start-up/shutdown, log checks, alarm acknowledgement
MOP	Method of Procedure	Step-by-step instructions for one specific change or maintenance task, with prerequisites, risk assessment, and a back-out plan
EOP	Emergency Operating Procedure	Response to emergencies: fire, loss of utility, EPO activation, flooding, or cooling failure

A fourth category, the abnormal-operating procedure (sometimes abbreviated AOP or AOR), sits between the SOP and the EOP: it governs conditions that are off-normal but not yet an emergency, for example running on generator, a UPS on bypass, or a failed cooling unit while the site is still stable. Note that the term AOR is also used for "Area of Responsibility" — the document that clarifies who owns which system — so read the context. Together these procedures form the operations playbook required by frameworks such as the Uptime Institute Management & Operations (M&O) Stamp of Approval and the EPI Data Centre Operating Standard (DCOS).

Service Agreements: SLA vs OLA vs Underpinning Contract

Operations must deliver against layered commitments drawn from ITIL and ISO/IEC 20000:

SLA (Service Level Agreement) — the customer-facing contract that quantifies availability (for example 99.99%), response and restoration times, and credits or penalties for breach.
OLA (Operational Level Agreement) — an internal agreement between teams within the same provider (for example the facilities team committing a two-hour response to the hosting team). OLAs underpin the SLA but never involve the customer.
Underpinning Contract (UC) — a contract with an external third-party supplier, such as a generator maintenance vendor, chiller service company, or telecom carrier. If a UC promises a four-hour parts response, the SLA to the customer cannot credibly promise faster.

The key exam trap is direction: SLA is external to the customer, OLA is internal, and the UC is external to a supplier. A chain of realistic OLAs and UCs is what makes an SLA achievable.

Maintenance Strategies

Data centre reliability is largely a maintenance discipline. Three strategies appear on the exam:

Preventive (planned) maintenance follows a fixed schedule regardless of current condition — quarterly generator load-bank tests, annual UPS battery capacity tests, filter changes. It reduces failure probability but can replace healthy parts early.
Predictive (condition-based) maintenance triggers work from monitored condition data such as vibration, thermal imaging, oil analysis, or battery impedance trends. It times intervention just before failure, cutting both downtime and wasted parts.
Corrective maintenance repairs equipment after a fault; run-to-failure deliberately waits for breakdown on non-critical, low-cost components. Reliability-Centred Maintenance (RCM) blends these strategies by criticality.

Capacity Management

Capacity management ensures the facility never runs out of any of its four resources — power, cooling, physical space, and network — before the others. A cabinet may have rack units free yet be power-capped; a row may have power yet lack cooling headroom. Capacity that exists but cannot be used because a paired resource is exhausted is called stranded capacity. Operators use DCIM (Data Center Infrastructure Management) to model headroom, forecast growth, and place new equipment where all four resources are available, feeding real-time power and thermal data from EPMS and BMS.

Change Management

Every non-routine action is a risk to uptime, so change management wraps it in controls. A proposed change is documented in a MOP, assessed for risk, reviewed (often by a change advisory board), and scheduled into a maintenance window. High-risk electrical work additionally requires Lockout-Tagout (LOTO) to prevent inadvertent re-energisation and arc-flash-rated PPE selected from an incident-energy analysis per NFPA 70E. A sound change always defines success criteria, abort/back-out triggers, and post-change verification — because the leading cause of data centre outages is human error during change, not equipment failure.

Operational Roles and Incident Management

Sound operations also depend on clear human structure. A staffed site runs on defined shifts with an Area of Responsibility (AOR) matrix or RACI chart stating who is accountable for each system, so no task falls between teams. Incidents are logged, triaged by severity, escalated on a defined path, and closed with a root-cause analysis whose lessons feed back into updated SOPs, MOPs, and EOPs — a continuous-improvement loop that maturity frameworks such as the EPI DCOS and the Uptime M&O Stamp assess directly. Shift handovers use a structured log so that in-progress abnormal conditions — a generator run, a UPS on bypass, a suppressed alarm — are never lost between crews, and every maintenance window closes only after post-work verification confirms the site is back to its normal, fully redundant state.

Common operations traps

SLA vs OLA direction: the SLA faces the customer, the OLA is internal, and an underpinning contract faces an external supplier.
SOP vs MOP: an SOP is the routine 'what to do'; a MOP is the change-specific 'how to do it' with a back-out plan.
Preventive vs predictive: preventive follows the calendar, predictive follows the condition data.
Human error: most outages occur during change, so no live electrical work proceeds without an approved MOP, LOTO, and NFPA 70E arc-flash PPE.

Capacity management and change management are two sides of one coin: capacity management decides whether new load can be added safely, while change management controls how it is added. Both lean on DCIM as the shared system of record, and both ultimately exist to protect the availability the SLA promises.

EXIN Certified Data Centre Professional

EXIN Certified Data Centre Professional (CDCP)

7.2 Data Centre Operations

Key Takeaways

Operational Procedures: SOP, MOP, EOP

Service Agreements: SLA vs OLA vs Underpinning Contract

Maintenance Strategies

Capacity Management

Change Management

Operational Roles and Incident Management

Common operations traps

EXIN Certified Data Centre Professional

1Introduction & Exam Strategy

2Standards & the Mission-Critical Site

3Site, Building, Raised Floor & Auxiliary Systems

4Power Infrastructure

5Cooling Infrastructure

6Fire Protection & Physical Security

7Cabling, Operations & Efficiency

EXIN Certified Data Centre Professional (CDCP)

7.2 Data Centre Operations

Key Takeaways

Operational Procedures: SOP, MOP, EOP

Service Agreements: SLA vs OLA vs Underpinning Contract

Maintenance Strategies

Capacity Management

Change Management

Operational Roles and Incident Management

Common operations traps