Redundancy, Operations, and MOU/MOA

Key Takeaways

  • Redundancy removes single points of failure across power, links, devices, cooling, paths, and providers — but only if it is meaningful, not cosmetic.
  • Common-mode failure (shared conduit, shared upstream switch, shared power strip) defeats redundancy that looks separate on a diagram.
  • An MOU records broad mutual understanding; an MOA defines specific agreed responsibilities; an SLA sets measurable service commitments.
  • Operational controls — runbooks, monitoring, maintenance windows, change control, and tested failover — make redundancy usable.
  • Resilience planning must cover both technical failure and coordination failure between teams or partners.
Last updated: June 2026

Resilience Is Design Plus Coordination

A network can have dual firewalls, two internet providers, redundant power, and backup cooling and still fail if nobody owns an alarm or can authorize a repair. CC scenarios test two skills: spotting a single point of failure (SPOF) and recognizing the human agreements that keep operations running.

Meaningful Versus Cosmetic Redundancy

Redundancy means more than one way to deliver a needed function: dual power supplies, UPS plus generator, redundant switches, high-availability (HA) firewall pairs, multiple internet circuits, diverse cable paths, replicated services, and a secondary site or cloud region. The trap is common-mode failure — components that appear separate but share a hidden dependency.

Looks redundantHidden shared dependencyStronger design
Two ISP circuitsSame conduit into the buildingDiverse carriers and diverse physical paths
HA firewall pairOne upstream core switchRedundant upstream switching
Dual power suppliesSame power strip / breakerSeparate A and B feeds
Backup generatorNo fuel contract or testingFuel SLA, scheduled load tests

Resilience is also measured. Recovery Time Objective (RTO) is how fast a service must be restored; Recovery Point Objective (RPO) is how much data loss is acceptable. Redundancy choices should map to those targets, not to a vendor brochure.

Operations: Making Redundancy Usable

Redundancy fails without operations behind it. Required pieces include:

  • Diagrams and runbooks so responders act without guessing.
  • Monitoring and alert routing so the right person is paged.
  • Maintenance windows and change control so a switch swap, cooling service, and firewall edit do not collide.
  • Configuration backups and spare parts so recovery does not wait on procurement.
  • Tested failover — a failover nobody has exercised may not work when needed.

Change control should identify affected systems, rollback steps, communication paths, and who has decision authority during an outage.

Agreements: MOU, MOA, SLA, and BPA

These documents are frequently confused on the exam. Learn the distinction by formality and specificity:

DocumentFull nameWhat it does
MOUMemorandum of UnderstandingRecords broad mutual understanding, shared goals, intent to cooperate; often non-binding
MOAMemorandum of AgreementDefines specific agreed responsibilities, actions, resources, and timelines
SLAService Level AgreementSets measurable commitments: uptime, response times, remedies/penalties
BPABusiness Partner AgreementGoverns a joint business relationship and how partners interact

For CC purposes, the key is that an MOU is the looser "we understand each other," an MOA pins down "who does what," and an SLA makes commitments measurable. In shared infrastructure these define who monitors a link, who calls the provider, who responds to incidents, and what notice precedes maintenance.

Worked Scenarios

Scenario 1: A city agency and county partner share a "redundant" emergency link. When packet loss appears, each side assumes the other will open a ticket; hours pass. The technical design was fine — the coordination failed. A clear MOA assigning monitoring duty, escalation contacts, and response-time expectations fixes it. Scenario 2: Two circuits from two carriers enter through the same trench; a backhoe cuts both. The lesson is to address common-mode failure with diverse carriers, diverse paths, documented failover, and periodic testing.

On the exam, never treat resilience as a single product purchase — it is architecture, physical diversity, monitoring, agreements, testing, and people who know what to do.

Failover Modes and Why They Differ

Not all redundancy responds at the same speed, and the exam may probe the difference. Active-active designs run two or more nodes carrying live traffic at once; if one fails, the others absorb the load with little or no interruption, and capacity is used efficiently. Active-passive (hot standby) keeps a second node ready but idle until the primary fails, then promotes it. Warm and cold standby trade cost for recovery speed: a warm site has equipment and recent data ready to start, while a cold site is space and power that must be built out before use.

Map these to RTO — active-active yields the shortest recovery time, a cold standby the longest. Load balancers, HA clustering, and health checks are the machinery that detects failure and redirects traffic.

Agreement Pitfalls and Vendor Oversight

A frequent CC theme is that agreements only help if they are read, current, and enforced. An SLA that promises 99.9 percent uptime but excludes scheduled maintenance, or that offers a service credit rather than meaningful remedy, may not match business expectations — so review coverage, exclusions, and remedies before signing. With partners and vendors, define incident-notification timeframes (how fast must they tell you about a breach?), data-handling duties, right-to-audit clauses, and end-of-relationship data return or destruction.

When a shared circuit or service fails during an emergency, teams should already know contacts and responsibilities, not discover them under pressure.

Finally, distinguish the documents cleanly under exam pressure. If the question stresses "broad intent / mutual understanding," choose MOU. If it stresses "specific responsibilities, actions, who does what," choose MOA. If it stresses "measurable uptime, response times, penalties," choose SLA. If it stresses "governing a joint business relationship," choose BPA. Pairing the keyword in the stem to the document's defining trait is the fastest path to the right answer.

Test Your Knowledge

An organization buys two internet circuits from two different providers, but both fiber cables enter the building through the same underground conduit. What is the primary weakness?

A
B
C
D
Test Your Knowledge

Two agencies share a network link and want a document that specifies exactly who monitors the connection, who escalates incidents, and within what response time. Which document best fits?

A
B
C
D
Test Your Knowledge

Why should failover procedures be exercised before a real outage occurs?

A
B
C
D