No Formal Framework Exists to Measure or Verify Resilience of Autonomous Systems at Runtime

digitalinfrastructure

Problem Statement

Autonomous systems — surgical robots, self-driving vehicles, UAVs, smart grid controllers — must operate in open-world environments where models of the operating context are incomplete, conditions change unpredictably, and unforeseen disturbances occur. Current engineering practice verifies system safety at design time against a pre-specified set of scenarios, but design-time verification fundamentally cannot cover the open-ended conditions autonomous systems encounter in deployment. A Dagstuhl seminar convening international experts in autonomous systems, safety science, and formal methods concluded that no formal framework exists to define, measure, or verify system resilience at runtime — the ability to detect, absorb, and recover from unexpected disturbances during operation. Without runtime resilience metrics, there is no way to certify that an autonomous system will degrade gracefully rather than fail catastrophically when encountering conditions its designers did not anticipate.

Why This Matters

The deployment of autonomous systems is accelerating across safety-critical domains: aviation (autonomous air taxis, drone delivery), healthcare (surgical robots, autonomous drug delivery), transportation (autonomous vehicles, rail systems), and defense. Each domain requires assurance that systems can handle the unexpected, not just the anticipated. Current certification frameworks (DO-178C for aviation software, IEC 62304 for medical devices) assume design-time completeness — that all failure modes can be identified and mitigated before deployment. For systems operating in truly open worlds, this assumption is false. The consequence is either dangerous deployment without adequate assurance, or blocked deployment of beneficial technologies because no certification pathway accounts for runtime adaptation. The seminar described the emerging concept of "antifragility" — systems that not only survive disturbances but improve from them — but noted this concept lacks any formal engineering foundation.

What’s Been Tried

Runtime monitoring systems can detect some anomalies during operation but provide reactive detection, not proactive resilience — they catch problems after they occur rather than preventing degradation. Self-adaptive systems research has produced architectures (MAPE-K feedback loops, rainbow frameworks) for systems that modify their own behavior, but these architectures lack formal guarantees about the quality of adaptation and cannot prove that self-modification preserves safety properties. Fault-tolerant computing provides redundancy-based resilience but assumes known failure modes — it cannot handle truly novel disturbances. Control theory offers formal stability and recovery metrics (settling time, overshoot) that the seminar identified as a promising foundation, but these metrics have not been extended to the discrete, software-intensive, multi-objective decision-making of autonomous systems. The fundamental gap is between communities: control theorists, formal methods researchers, AI safety researchers, and domain-specific certification bodies are each working on fragments of the problem with different formalisms and different definitions of resilience.

What Would Unlock Progress

Formal metrics for resilience and antifragility grounded in control theory but applicable to software-intensive autonomous systems — the seminar proposed settling time, percentage of settling, and overshoot as starting points but acknowledged these need validation. Engineering methods for building systems that can detect novel (previously unencountered) disturbances and improve their response over time without violating safety constraints. Standardized benchmark scenarios across application domains (healthcare, transportation, aviation) that allow comparison of different resilience approaches. A certification framework that explicitly accounts for runtime learning and adaptation — bridging the gap between traditional design-time assurance and the reality of open-world operation.

Entry Points for Student Teams

A team could implement a simulated autonomous system (e.g., a robot navigating a dynamic environment, or an autonomous drone maintaining formation) with a self-adaptation mechanism, then systematically inject increasingly novel disturbances (sensor degradation, unexpected obstacles, communication failures) and measure recovery using the proposed control-theory metrics (settling time, overshoot). The deliverable would be a quantitative evaluation of how well these metrics capture the intuitive concept of "resilience" and where they break down. This is feasible as a robotics/software engineering project with simulation tools like ROS/Gazebo or simple Python-based environments.

Genome Tags

Constraint

technicalregulatory

Domain

digitalinfrastructure

Scale

global

Failure

not-attempteddisciplinary-silo

Breakthrough

algorithmsystems-redesign

Stakeholders

multi-institution

Temporal

worsening

Tractability

proof-of-concept

Source Notes

- The seminar brought together researchers from computer science, safety science, control theory, ethics, and industry — the disciplinary breadth itself reflects the fragmentation of the problem. - Directly related to existing brief `DIGITAL-autonomous-systems-formal-verification` — that brief covers the verification gap for learning-enabled components; this brief covers the runtime resilience gap for systems operating in open worlds. Together they describe the two halves of the assurance problem for autonomous systems. - The antifragility concept (Taleb, 2012) is provocative but lacks engineering formalization — the seminar's attempt to ground it in control theory metrics is a first step. - The `failure:not-attempted` tag is warranted because runtime resilience has been discussed conceptually but not formally defined or measured in practice for autonomous systems. - The safety/ethical/legal implications of systems that autonomously modify their own behavior remain unresolved and were flagged as a critical concern by seminar participants — this connects to the broader AI governance challenge. - Use cases proposed for benchmarking include health/assistive care, transportation, and aviation — domains where the consequences of resilience failure are most severe.

Source

"Resilience and Antifragility of Autonomous Systems," Dagstuhl Seminar 24182, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Reports, Vol. 14, Issue 4, pp. 142–163, 2024. DOI: 10.4230/DagRep.14.4.142. https://drops.dagstuhl.de/entities/document/10.4230/DagRep.14.4.142 (accessed 2026-02-12)