Resilient Systems

Traditional security is concerned primarily with making systems more robust to adversarial attacks. This includes designing the system with appropriate authentication and access, control, removing the vulnerabilities in the design or detecting intrusions and deploying countermeasures to the intrusion. Improving the robustness of the system can be achieved by improving its implementation, patching vulnerabilities or improving the detection and the response.

In contrast Resilience focusses on the graceful degradation of the system and its recovery in response to attacks. That is we aim to minimise the amount of time in which the system is unavailable (or cannot provide its functionality responses to attacks. This can be achieved by increasing the robustness of the system to attacks but also by reducing the impact of the attack e.g., by increasing redundancy, or by facilitating the recovery of the system. When only finite resources are available, these strategies are competing with each other. To choose the right strategy, or the right investment in the system’s resilience, it is not sufficient to consider the degree of compromise of the system, but it is also necessary to determine the impact of the attack. For this reason, our work combines models of attack progression (such as attack graphs), with models of the functionality of the system, to reason about the impact of the attack and select countermeasures.

Risk Assessment with Bayesian Attack Graphs

Identifying, modelling, and assessing the security risks and prioritizing the most critical threats is of essence to optimise the resources for network protection. Attack graphs have been proven as a powerful tool for these tasks. They provide compact representations of the attack paths that can be used by attackers to compromise valuable resources in complex networks and systems. We are interested in scalable techniques to enable security risk assessment in large networks and infrastructures.

We have developed exact and approximate inference techniques with Bayesian attack graphs for static and dynamic risk assessment, scaling up to networks with thousands of nodes. We are working on new attack graph representations to model the interdependence between different security aspects, including the physical, social, and cyber dimensions. We are also investigating more scalable attack graph generation models capable to cope with the size and the dynamic nature of current network environments, including IoT deployments.

Recovery or Containment

Combining an attack graph model with a dependency model of the system allows us to investigate not only attack progression but also the impact of the attack in an inter-dependent system. This allows us to build models where we can compare different mitigation and remediation techniques. For example, given limited resources to respond to an attack is it more effective to focus on containing the attack or in recovering the service performance of the system.

In cases where the availability of the system is fundamental, the question is not easily answered without such models. We find that there are circumstances in which it is more cost-effective to favour recovery over containment, that considering the long-term costs can lead to significant savings that can be quantified and that the time-to-detection of an attack plays a significant role in the selection of the appropriate response strategy.

Evaluating Investments in Redundancy and Diversity

We have shown that by combining a model of attack progression together with a performance model of the system, it is possible to analyse the configuration of a system and to evaluate the impact that investments in redundancy (with and without diversity) have in terms of increasing the resilience of the system. We find that the cost-effectiveness of redundancy depends on the SLA terms, the probability of attack detection, the time to recover, and the cost of maintenance. In our case study, redundancy with diversity achieved a saving of up to around 50 percent in expected attack costs relative to no redundancy. The overall benefit over time depends on how the saving during attacks compares to the added maintenance costs due to redundancy.

Which Attacks lead to Safety Violations?

There are many challenges and tensions at the intersection of Security and Safety. We have focussed so far on methodologies to identify which (cyber)attacks lead to violation of the safety properties and understanding which security countermeasures can be applied whilst preserving the safety of the system. Our initial work on this topic can be found here. However, more is coming really soon now …