Resilience Analysis
The Wrong Question About Failure
Planners of space architectures spend most of their risk energy on the wrong question. They ask how to prevent failure — how to harden a satellite against radiation, how to secure a ground station against intrusion, how to reduce the probability of a supply-chain disruption. These are reasonable questions, and the discipline that answers them is mature. But they are not the decisive questions for critical space infrastructure.
The decisive questions are what happens after the failure that cannot be prevented: how badly does the system degrade when a ground station goes offline, how long until partial service returns, how does the loss of a single vendor cascade through the constellation, how does graceful degradation play out when two disruptions arrive within the same recovery window. These are resilience questions, not risk questions, and they have a different analytical shape.
Consider a GNSS augmentation service that has been hardened, audited, and insured against every plausible threat. Its preventive posture is excellent. A targeted cyber intrusion nonetheless succeeds against one of its three redundant ground stations, and the recovery team discovers that the hardware replacement supply chain depends on a single vendor with an eighteen-month lead time. The system did not fail to prevent the intrusion in any culpable sense; it failed to have thought about what the eighteen months after the intrusion would look like. The preventive posture was thick, and the resilience posture was thin, and the gap between them produced the service outage that mattered.
Resilience analysis is the discipline of the second question. It accepts that disruptions will happen — some predictable, some not — and examines how well the system performs once they have happened. For critical space services, whose failure propagates into terrestrial consequences, it is the analysis whose absence is most consistently felt after a bad event.
Ecology, Infrastructure, and Complex Systems
The intellectual roots of resilience thinking are older than space and older than security. Crawford Stanley Holling, writing in ecology in 1973, introduced a distinction that the field has lived with ever since: stability is the property of a system that returns quickly to equilibrium after disturbance, while resilience is the property of a system that absorbs disturbance and maintains its essential structure even when the equilibrium itself shifts. Ecological systems, Holling argued, can be unstable and resilient simultaneously, flipping between configurations without losing coherence. The insight disturbed an engineering tradition that had conflated the two concepts.
The critical-infrastructure community absorbed resilience thinking in the wake of the post-2001 security shocks, when the realisation arrived that defensive postures against specific threats were producing systems that were brittle to unanticipated ones. The United States, through its Department of Homeland Security, produced a series of frameworks from the mid-2000s that placed resilience alongside protection as a co-equal objective. The European Union’s NIS and subsequent directives pushed in a similar direction. By the 2010s, resilience had become a standard category in critical-infrastructure analysis, with recognised subdomains — energy, transport, telecommunications — and a shared vocabulary.
Complex adaptive systems theory, drawing on work at the Santa Fe Institute and elsewhere, supplied the third strand. Systems with many interacting components, feedback loops, and emergent behaviour were shown to exhibit distinctive failure patterns: cascading propagation, phase transitions, brittle equilibria that hid fragility until a trigger arrived. The vocabulary of the field — tipping points, feedback loops, emergence — entered resilience practice and shaped the analytical questions that mature resilience analysis now asks.
For space systems, the intellectual convergence matters because space architectures are, in the relevant sense, all three things at once: they are ecosystems of interacting sensors, buses, links, and ground infrastructure; they are critical infrastructure whose degradation imposes costs far beyond the operator; and they are complex adaptive systems whose behaviour under stress is not always predictable from component-level specifications. Each of the three lineages supplies a necessary part of the analytical vocabulary.
The Characteristic Move
What resilience analysis does that risk analysis does not is reverse the assumption about failure. Risk analysis asks how likely an adverse event is and what its magnitude would be, and from this produces a prevention priority. Resilience analysis accepts that the event will happen — not because every event is inevitable, but because the portfolio of possible events is too large for prevention alone to cover — and asks what happens after.
The first analytical move is the definition of critical function. A resilience analysis begins with a statement of what the system must do to be considered operational. “GNSS augmentation” is not specific enough; “position accuracy within a defined tolerance, available to users in a defined region, with continuity-of-service guarantees” is specific enough. The performance threshold defines what “acceptable operation” means, and the analysis that follows is an assessment of how well the system maintains that threshold under stress. Analysts who skip this step produce resilience findings that float detached from consequences.
The second move is the scenario set. A resilience analysis does not evaluate the system against a single disruption; it evaluates against a portfolio of disruptions that exercise different vulnerabilities. Kinetic disruption, cyber compromise, supply-chain disruption, space weather, regulatory change, market failure, and — critically — slow-onset chronic stresses such as debris accumulation or workforce attrition. The scenario set is deliberate and disciplined, not a laundry list. Each scenario is specified with enough detail that the system’s response can be assessed.
The third move is the decomposition into absorptive, adaptive, and recovery capacities. Each capacity is assessed independently for each scenario, and the three together form the resilience scorecard.
| Capacity | Question it answers | Typical indicators |
|---|---|---|
| Absorptive | Can the system withstand the initial impact without degrading function? | Redundancy, diversity, robustness, buffering margins |
| Adaptive | Can the system reconfigure under stress? | Flexibility in routing, decision speed, interoperability, graceful-degradation paths |
| Recovery | How quickly and completely does the system return to acceptable operation? | Reconstitution plans, supply-chain depth, recovery-time objectives |
The fourth move is the failure propagation analysis. Space architectures contain single points of failure and cascading dependencies, and the analysis is incomplete until these are identified. A shared ground station whose loss degrades multiple services; a common software stack whose compromise propagates across a constellation; a sole-source vendor whose disruption creates an eighteen-month recovery gap — these are the hidden failure modes that component-level analysis tends to miss.
The fifth move is the comparative scorecard. Resilience scores are meaningful only in relation to alternatives or benchmarks. A single number — “this system has resilience score 7” — is not useful. A comparison — “this architecture has higher absorptive capacity than its predecessor but lower recovery capacity than a distributed alternative would” — is useful. The method is comparative by construction, and analysts who produce absolute scores have misapplied it.
What distinguishes resilience analysis from neighbouring methods is the structural acceptance of failure combined with the decomposition into absorb, adapt, and recover capacities. Risk matrix assessment asks about probability and severity of events; threat modelling enumerates attack paths; disruption theory asks whether a different architecture would be better. Resilience analysis is the one method whose explicit question is how well the current architecture performs after the disruption has arrived.
The Method at Work: A Regional GNSS Augmentation System
Consider a regional GNSS augmentation service whose users include aviation, maritime, and precision-agriculture customers. The critical function is specified: position accuracy to a defined integrity level, continuously available within the service region, with a maximum tolerated outage duration. The scenario set is deliberately diverse: a kinetic disruption to a space segment asset, a cyber compromise targeting a ground-segment element, a supply-chain interruption affecting a critical hardware replacement, an extended space weather event, and a chronic workforce attrition in the operating authority.
The absorptive-capacity assessment produces favourable findings. The ground segment operates with triple redundancy across geographically distributed stations. The space segment maintains operational margin sufficient to survive the loss of one or two assets. The power and communications infrastructure is hardened. Initial impact of most scenarios is absorbed without visible service degradation.
The adaptive-capacity assessment is more mixed. The system’s failover between ground stations is manual rather than automated, producing a response lag measured in hours rather than minutes. Interoperability with allied augmentation services exists at a technical level but has not been exercised operationally, so the adaptive path is theoretical rather than rehearsed. The result is adaptive capacity that looks acceptable on paper and would likely prove uneven under pressure.
The recovery-capacity assessment surfaces the decisive finding. The critical hardware elements at the ground stations depend on a single vendor with a limited production line. Replacement lead time, in the scenario where a ground station is physically damaged or cyber-compromised beyond remediation, is measured in months rather than weeks. The service could absorb the initial event; it would struggle to adapt around it within the service-level window; and it would recover slowly because the supply chain that restores full capacity is brittle. The resilience scorecard reads: absorptive capacity high, adaptive capacity moderate, recovery capacity low.
The failure-propagation analysis confirms the concern. A scenario in which a cyber compromise disables one ground station and a concurrent supply-chain disruption delays replacement produces a cascading exposure: degraded service for a period long enough to impose material operational costs on aviation and maritime users, with knock-on regulatory and insurance consequences. The compound scenario is more likely than simple probability calculations suggest, because both disruptions can be triggered by the same upstream event — a sophisticated adversary might engineer both simultaneously.
The analytical finding, and the value the method delivers, is that the system’s preventive posture is adequate and its resilience posture is structurally brittle. Further investment in hardening is subject to diminishing returns; investment in supply-chain diversification, automated failover, and exercised interoperability with allied services would produce much larger resilience gains per unit of resource. The recommendation is not “harden more” but “diversify and exercise.” That recommendation would not emerge from a traditional risk analysis, which is oriented toward prevention rather than performance-under-stress.
The scorecard itself becomes a shared analytical object. A downstream scenario-planning exercise consumes the finding about recovery weakness, branching on scenarios in which ground-station loss coincides with supply-chain disruption. A deterrence-escalation assessment references the same scorecard to ask whether the vulnerability creates an attractive target for an adversary seeking to impose disproportionate cost. A procurement review uses the scorecard as a decision criterion in choosing the next generation of ground-segment hardware. The resilience analysis is produced once and consumed repeatedly by methods whose questions it informs without duplicating.
Where It Holds, Where It Limps
Resilience analysis holds where the system is well-enough defined that its components, dependencies, and performance thresholds can be specified, and where the question is how it performs under disruption rather than whether disruption can be prevented. For critical space infrastructure whose degradation imposes broad costs, it is the analytical discipline that most reliably surfaces the brittleness hidden beneath preventive strength.
Its limits are significant.
Resilience analysis pairs naturally with scenario planning (which uses the scorecard as a stress-test baseline), with deterrence-escalation analysis (which uses vulnerability findings as adversary-incentive inputs), with investment analysis (which uses the scorecard as a risk-adjustment factor), and with geopolitical risk frameworks (which supply the scenario inputs the method tests against).
A Note for the Practitioner
Reach for resilience analysis when the strategic question is not whether disruption can be prevented but how well the system performs after disruption arrives. It is the appropriate lens for critical-infrastructure assessments, architectural comparisons, and investment reviews whose outcome depends on performance under stress.
Approach it with two rules. First, produce the analysis once, precisely, and expect downstream methods to consume the scorecard rather than reproduce it. Duplication of the absorb-adapt-recover assessment across methods hollows the output. Second, ensure the scenario set includes at least one chronic stress; the method’s characteristic blind spot is slow-onset fragility. Pair with scenario planning for the environmental-variation layer and with disruption theory for the architectural-alternative question. The operational version of the method, with its full scorecard and failure-mode protocol, is available in the method library for practitioners who need to apply it systematically to a specific system.
spacepolicies.org