The Testing Illusion: Why Your BCM Exercises Are Preparing You for Disruptions That No Longer Happen

Business Continuity Management | William C Hord

This is the fourth and final article in a series that has traced one persistent thread through the state of business continuity management at financial institutions in 2026.

The first article established that execution — not documentation — is where BCM programs now fail. Plans exist. The ability to operate under actual stress, adapt under real pressure, and prove resilience to examiners is what breaks down. The second article confronted the reality that cyber risk is no longer an IT problem. A ransomware event is an operational shutdown — one that propagates through exactly the kind of dependency chains that most BCM programs were not designed to manage. The third article named what drives both failures: a dependency blindspot. Most institutions know what systems they operate. Very few know what those systems actually rest on, and their plans are built on assumptions about operational independence that the cloud, SaaS, and integration era has made obsolete.

Each of those articles identified a failure in how institutions build and maintain their BCM programs.

This final article examines how they test them — and why the gap between how testing is conducted and what regulators and real-world disruptions now require is the compounding problem that makes everything else worse.

The Compliance Test That Doesn't Test Anything

Most financial institutions conduct BCM testing. They complete the exercise, document the results, present findings to the board, and satisfy the examination requirement. By all formal measures, the box is checked.

And in most cases, the exercise has not actually validated whether the institution can survive a disruption.

This is the testing illusion: the appearance of validation without the substance of it.

It emerges from a gap between what testing was designed to do and what the current disruption environment requires. Traditional BCM testing was built around a set of assumptions — bounded failures, predictable scenarios, independent system recovery — that were reasonable when they were embedded into testing frameworks but no longer reflect how operational failures actually occur.

The institution runs the test. The test passes. The disruption arrives. The plan fails.

Not because the testing was dishonest. Because the testing was designed for a different kind of failure.

What Regulators Actually Require

The Federal Financial Institutions Examination Council's Business Continuity Management booklet — which replaced the previous Business Continuity Planning guidance and reflects the shift from plan documentation to resilience management — establishes the current examination standard for BCM exercises and tests at financial institutions of all sizes.¹

The FFIEC is explicit. Management must develop realistic exercise and test scenarios, based on identified risks, that simulate disruptions in business functions and help management determine the ability to meet both business requirements and customer expectations. Scenarios must include threats that could affect third-party service providers and significant business partners. Exercises and tests must include communication processes with applicable stakeholders. And critically, the FFIEC states the goal should not be to execute exercises without issues — it should be to continuously strengthen the BCM program and validate that it actually works.²

The FFIEC classifies exercises along a complexity spectrum: tabletop exercises at the foundational level, walk-through drills where teams execute initial steps, functional drills where a single function executes recovery, and full-scale exercises encompassing enterprise-wide activation.³ Each level serves a different validation purpose. The FFIEC expects institutions to progress in complexity commensurate with their risk profile and operational environment.

The Federal Reserve, OCC, and FDIC's Interagency Paper on Sound Practices to Strengthen Operational Resilience — SR 20-24 — extends this expectation to the largest and most complex institutions, requiring that information systems supporting critical operations and core business lines be subject to programs that are regularly tested, with testing that includes dependency recovery, not just primary system recovery.⁴

What both frameworks have in common is an expectation that testing validates actual recovery capability — not that it demonstrates familiarity with documented procedures.

Those are not the same thing. And the gap between them is where most BCM testing currently lives.

The Four Patterns That Define Testing Maturity Failure

Across BCM programs at financial institutions, four patterns appear consistently in programs that pass their examinations but fail under actual disruption conditions.

Pattern one: The same scenario, repeated annually.

The most common testing failure is not that institutions don't test — it's that they test the same scenario every year, often with the same participants, in the same sequence, producing the same after-action report with incrementally updated action items that are never fully resolved before the next exercise.

The FFIEC is clear that scenarios should be based on current risks identified through the institution's risk assessment and business impact analysis.² If ransomware is the top-rated operational risk in the institution's current risk environment — and for most financial institutions it is — the scenario should simulate ransomware, not a generic data breach or a facility loss that hasn't been the primary risk driver for years.

The repetition problem is not laziness. It is institutional inertia combined with the absence of a structured mechanism for connecting scenario selection to the current risk landscape. When testing is disconnected from risk assessment, the scenario ages without anyone noticing.

Pattern two: Exercises without injections.

A tabletop exercise that follows a linear narrative — disruption occurs, team responds, services restored, exercise concludes — tests the team's familiarity with the plan. It does not test their ability to make decisions under conditions that don't follow the plan.

Real disruptions do not follow scripts. They introduce complications mid-event: the vendor who was supposed to provide alternate processing capacity cannot be reached; the identity environment that recovery access depends on is also compromised; the communication channel the team planned to use is unavailable; the senior decision-maker whose authority is required for an action is not available.

The FFIEC specifically expects that exercises include escalation procedures and the ability to adjust for simulated scenarios.¹ Injects — pre-scripted complications introduced at key decision points during the exercise — are the mechanism for testing whether teams can actually respond to a disruption as it evolves, rather than as it was anticipated.

Programs that conduct exercises without injects are testing plan knowledge. They are not testing operational resilience.

Pattern three: Testing in isolation from dependencies.

The third article in this series documented in detail how dependency chains — cloud platforms, identity providers, third-party integrations — are the actual source of cascading failures in modern operational disruptions. The testing equivalent of that blindspot is testing the recovery of primary systems without testing the availability of the dependencies those systems require to recover.

Recovery strategies assume that the team can authenticate to recovery environments, access backup systems, and execute restoration procedures. All of those assumptions rest on dependencies — identity and access management infrastructure, network connectivity, vendor-provided systems — that may themselves be affected by the disruption being simulated.

Testing that validates primary system recovery without introducing dependency failures is testing the easy scenario. It is not testing the scenario that will actually occur.

Pattern four: After-action reports that don't change the plan.

The FFIEC requires that exercise and test results be documented, that issues identified through exercises have action plans with target dates for resolution, and that management update the BCM program based on findings.¹ The examination procedures explicitly instruct examiners to verify that corrective actions have been implemented and that retesting occurs to address deficiencies.⁵

In practice, most institutions produce after-action reports. Far fewer close the loop. Findings are documented, assigned, and then carried forward from one exercise cycle to the next without resolution — either because the action items are deprioritized against operational demands, or because the mechanism for tracking them to completion does not exist in the institution's BCM program.

An exercise that produces findings that are never resolved has not strengthened the BCM program. It has documented its weaknesses.

What Severe But Plausible Actually Means in 2026

The phrase "severe but plausible" appears across regulatory guidance as the standard for BCM scenario design. It is often interpreted as a moderating instruction — not too catastrophic, not too simple, something in between. That interpretation misses the point.

Severe but plausible means the scenario should represent the actual risk environment the institution faces, at the upper bound of what could realistically occur. In 2026, that standard produces scenarios that look very different from the scenarios most institutions currently test.

A severe but plausible cyber scenario for a financial institution in 2026 is not a generic data breach. It is a ransomware event that compromises the identity environment, locks the team out of cloud-hosted systems, and spreads through third-party integrations to affect multiple critical services simultaneously — consistent with the attack patterns the OCC's Spring 2026 Semiannual Risk Perspective identifies as active risks in the current environment.⁶

A severe but plausible vendor scenario is not a vendor experiencing a service degradation. It is a cloud infrastructure provider experiencing a regional outage that affects multiple critical vendors simultaneously, collapsing several recovery assumptions at once — the dependency concentration problem the third article in this series described as a blindspot that most BCM programs have never mapped, let alone tested.

A severe but plausible operational scenario is not a branch flooding. It is a combination event — cyber disruption coinciding with a third-party failure, affecting staffing access and system availability simultaneously — of the kind the Federal Reserve's Interagency Paper on Operational Resilience identifies as the failure mode that traditional recovery strategies were not designed to address.⁴

The reason most institutions don't test scenarios at this level of severity is not that the guidance doesn't require it. It is that their testing programs were designed when "severe but plausible" meant something less severe. The disruption environment has moved. The testing standard has not.

The Three Shifts That Define Testing Maturity

Moving from compliance-level testing to testing that genuinely validates operational resilience requires three structural changes — each of which builds on what the previous articles in this series established.

First: Connect scenario selection to the current risk register and BIA.

This is the foundational requirement, and the one most consistently absent. The FFIEC is explicit that scenarios should be based on identified risks.² If the institution's risk assessment identifies ransomware, cloud concentration, and third-party integration failure as the top operational risks — which most current assessments should, given the regulatory signals documented in the second article in this series — then the BCM exercise program should be testing those scenarios, not scenarios left over from a risk environment that no longer exists.

This connection requires that the BCM program and the enterprise risk management program share a taxonomy and an information flow. Scenarios cannot be selected based on current risks if the risk information does not reach the BCM team in a form they can act on.

Second: Test the dependency chain, not just the primary system.

The dependency blindspot documented in the third article is not only a planning problem. It is a testing problem. An institution cannot know whether its recovery strategies are achievable under real disruption conditions until it has tested those strategies under conditions that include the failure of the dependencies the strategies rely on.

This means exercises must introduce dependency failures as injects — cloud platform unavailability, identity provider compromise, third-party integration outage — and evaluate whether recovery strategies remain executable when supporting infrastructure is also affected. If recovery only works when the dependencies are functioning, the plan is not resilient. It is conditional.

Third: Close the loop between findings and program improvement.

After-action reports that produce action items without accountability mechanisms are documentation of failure, not management of it. A testing program that continuously generates findings without resolving them is, over time, a program that knows its own weaknesses and has chosen not to address them.

The FFIEC's examination procedures verify that corrective actions have been implemented and that retesting validates the resolution.⁵ The institutions that hold up under examination scrutiny are the ones that have built the mechanism for this — a structured process that connects exercise findings to program updates to validation testing, with ownership, timelines, and board-level visibility into whether the loop is being closed.

The Series in Summary

Four articles. One persistent challenge with four dimensions.

The first article established that BCM execution — not planning — is the failure point. Plans exist and have always existed. The ability to operate under actual stress, make real-time decisions, and prove resilience across interconnected systems is what the current examination standard tests — and what most programs were not built to demonstrate.

The second article established that cyber risk is now a continuity crisis. A ransomware event is not a data loss incident. It is an operational shutdown that propagates through dependency chains, compromises recovery environments, and produces exactly the kind of cascading failure that traditional BCM scenarios were never designed to test.

The third article established that dependency chains are invisible in most BCM programs. Cloud platforms, identity providers, and third-party integrations create second and third-order failure modes that bounded-failure recovery strategies cannot address — and that no institution can manage if it hasn't mapped them.

This fourth article establishes that testing — the mechanism that should reveal all of these gaps — is itself subject to a maturity failure. Institutions run exercises that test plan knowledge rather than operational resilience. They test scenarios that don't reflect the current risk environment. They don't introduce the dependency failures that would expose whether recovery strategies actually work. And they produce after-action findings they never fully resolve.

The result is a BCM program that, in examination, appears sound — and, under actual disruption, breaks at the seams.

The Honest Assessment

The question for every BCM officer at a financial institution is not whether you conduct exercises. You do. The question is what your exercises actually test.

If the last exercise used a scenario that wasn't drawn from your current risk assessment, it tested familiarity with documented procedures — not the risks you actually face.

If the exercise didn't include injects that forced real decision-making under evolving conditions, it tested your team's ability to follow a script — not their ability to lead a response.

If the scenario didn't introduce dependency failures — cloud unavailability, identity compromise, third-party outage — it tested recovery in ideal conditions — not the conditions under which failures actually occur.

If the findings from the last exercise haven't been fully resolved and validated, the exercise produced documentation — not improvement.

The gap between a BCM exercise that passes examination and a BCM program that produces genuine resilience is the gap between testing for compliance and testing for capability.

Regulators are increasingly positioned to see that gap. Disruptions expose it in real time.

The institutions that close it will do so not by running more exercises, but by running better ones — grounded in current risks, tested against realistic dependency failures, and connected to a program improvement cycle that actually closes.

References

¹ Federal Financial Institutions Examination Council. Business Continuity Management, IT Examination Handbook. November 2019 (updated).ithandbook.ffiec.gov

² Federal Financial Institutions Examination Council. BCM IT Examination Handbook — Section VII.F: Exercise and Test Scenarios.ithandbook.ffiec.gov

³ Federal Financial Institutions Examination Council. BCM IT Examination Handbook — Section VII: Exercises and Tests.ithandbook.ffiec.gov

⁴ Board of Governors of the Federal Reserve System, Federal Deposit Insurance Corporation, and Office of the Comptroller of the Currency. Interagency Paper on Sound Practices to Strengthen Operational Resilience (SR 20-24). October 2020.federalreserve.gov

⁵ Federal Financial Institutions Examination Council. BCM IT Examination Handbook — Appendix A: Examination Procedures.ithandbook.ffiec.gov

⁶ Office of the Comptroller of the Currency. Semiannual Risk Perspective, Spring 2026. May 2026.occ.treas.gov

This is the fourth and final article in a series on the evolving state of business continuity management at financial institutions. Previous pieces examined BCM execution under stress, cyber risk as a continuity crisis, and dependency chain blindspots in BCM planning.