Cyber Security

Indiscriminate data poisoning against supervised learning: general attack formulations, robust defences, and poisonability

Javier Carnerero Cano

Abstract

Machine learning (ML) systems often rely on data collected from untrusted sources, such as humans or sensors, that can be compromised. These scenarios expose ML algorithms to data poisoning attacks, where adversaries manipulate a fraction of the training data to degrade the ML system performance. However, previous works lack a systematic evaluation of attacks considering the ML pipeline, and focus on classification settings. This is concerning since regression models are also applied in safety-critical systems. We characterise indiscriminate data poisoning attacks and defences in worst-case scenarios against supervised learning algorithms, considering the ML pipeline: data sanitisation, hyperparameter learning, and training. We propose a novel attack formulation that considers the effect of the attack on the model’s hyperparameters. We apply this attack formulation to several ML classifiers using L2 and L1 regularisation. Our evaluation shows the benefits of using regularisation to help mitigate poisoning attacks, when hyperparameters are learnt using a trusted dataset. We then introduce a threat model for poisoning attacks against regression models, and propose a novel stealthy attack formulation via multiobjective bilevel optimisation, where the two objectives are attack effectiveness and detectability. We experimentally show that state-of-the-art defences do not mitigate these stealthy attacks. Furthermore, we theoretically justify the detectability objective and methodology designed. We also propose a novel defence, built upon Bayesian linear regression, that rejects points based on the model’s predictive variance. We empirically show its effectiveness to mitigate stealthy attacks and attacks with a large fraction of poisoning points. Finally, we introduce the concept of “poisonability”, which allows us to find the number of poisoning points required so that the mean error of the clean points matches the mean error of the poisoning points on the poisoned model. This challenges the underlying assumption of most defences. Specifically, we determine the poisonability of linear regression.

RESICS : Resilience and Safety to attacks in Industrial Control and Cyber-Physical Systems

We all critically depend on and use digital systems that sense and control physical processes and environments. Electricity, gas, water, and other utilities require the continuous operation of both national and local infrastructures. Industrial processes, for example for chemical manufacturing, production of materials and manufacturing chains similarly lie at this intersection of the digital and the physical. This intersection also applies in other CPS such as robots, autonomous cars, and drones. Ensuring the resilience of such systems, their survivability and continued operation when exposed to malicious threats requires the integration of methods and processes from security analysis, safety analysis, system design and operation that have traditionally been done separately and that each involve specialist skills and a significant amount of human effort. This is not only costly, but also error prone and delays response to security events. 

RESICS aims to significantly advance the state-of-the-art and deliver novel contributions that facilitate:

  • Risk analysis in the face of adversarial threats taking into account the impact of security events across cascading inter-dependencies
  • Characterising attacks that can have an impact on system safety and identifying the paths that make such attacks possible
  • Identifying countermeasures that can be applied to mitigate threats and contain the impact of attacks
  • Ensuring that such countermeasures can be applied whilst preserving the system’s safety and operational constraints and maximising its availability.

These contributions will be evaluated across several test beds, digital twins, a cyber range and a number of use-cases across different industry sectors.

To achieve these goals RESICS will combine model-driven and empirical approaches across both security and safety analysis, adopting a systems-thinking approach which emphasises Security, Safety and Resilience as emerging properties of the system. RESICS leverages preliminary results in the integration of safety and security methodologies with the application of formal methods and the combination of model-based and empirical approaches to the analysis of inter-dependencies in ICSs and CPSs.

Funded by DSTL, this is a joint project between the Resilient Information Systems Security (RISS) Group at Imperial College and the Bristol Cyber Security Group. The work will be conducted in collaboration with: Adelard (part of NCC Group), Airbus, Qinetiq, Reperion, Siemens, Thales as industry partners and CMU, University of Naples and SUTD as academic partners. The project is affiliated with the Research Institute in Trustworthy Inter-Connected Cyber-Physical Systems (RITICS)

Project Publications

  • L. M. Castiglione, S. Guerra, E. C. Lupu, Automated Identification of Safety-Critical Attacks against CPS and Generation of Assurance Case Fragments. To be presented at Safety Critical Systems Symposium SSS’25.
  • Mathuros, Kornkamon, Sarad Venugopalan, and Sridhar Adepu. “WaXAI: Explainable Anomaly Detection in Industrial Control Systems and Water Systems.” Proceedings of the 10th ACM Cyber-Physical System Security Workshop. 2024. Awarded Best paper Award.
  • Ruizhe Wang, Sarad Venugopalan and Sridhar Adepu. “Safety Analysis for Cyber-Physical Systems under Cyber Attacks Using Digital Twin” in IEEE Cyber Security and Resilience 2024.

Other relevant publications

Presentations

Hazard Driven Threat Modelling for Cyber Physical Systems

Luca Maria Castiglione and Emil C. Lupu. 2020. Hazard Driven Threat Modelling for Cyber Physical Systems. In Proceedings of the 2020 Joint Workshop on CPS&IoT Security and Privacy(CPSIOTSEC’20). Association for Computing Machinery, New York, NY, USA, 13–24.

Adversarial actors have shown their ability to infiltrate enterprise networks deployed around Cyber Physical Systems (CPSs) through social engineering, credential stealing and file-less infections. When inside, they can gain enough privileges to maliciously call legitimate APIs and apply unsafe control actions to degrade the system performance and undermine its safety. Our work lies at the intersection of security and safety, and aims to understand dependencies among security, reliability and safety in CPS/IoT. We present a methodology to perform hazard driven threat modelling and impact assessment in the context of CPSs. The process starts from the analysis of behavioural, functional and architectural models of the CPS. We then apply System Theoretic Process Analysis (STPA) on the functional model to highlight high-level abuse cases. We leverage a mapping between the architectural and the system theoretic(ST) models to enumerate those components whose impairment provides the attacker with enough privileges to tamper with or disrupt the data-flows. This enables us to find a causal connection between the attack surface (in the architectural model) and system level losses. We then link the behavioural and system theoretic representations of the CPS to quantify the impact of the attack. Using our methodology it is possible to compute a comprehensive attack graph of the known attack paths and to perform both a qualitative and quantitative impact assessment of the exploitation of vulnerabilities affecting target nodes. The framework and methodology are illustrated using a small scale example featuring a Communication Based Train Control (CBTC) system. Aspects regarding the scalability of our methodology and its application in real world scenarios are also considered. Finally, we discuss the possibility of using the results obtained to engineer both design time and real time defensive mechanisms.

A Formal Approach to Analyzing Cyber-Forensics Evidence

Erisa Karafili’s paper “A Formal Approach to Analyzing Cyber-Forensics Evidence” was accepted at the European Symposium on Research in Computer Security (ESORICS) 2018. This work is part of the AF-Cyber Project, and was a joint collaboration with King’s College London and the University of Verona.

Title: A Formal Approach to Analyzing Cyber-Forensics Evidence

Authors: Erisa Karafili, Matteo Cristani, Luca Viganò

Abstract: The frequency and harmfulness of cyber-attacks are increasing every day, and with them also the amount of data that the cyber-forensics analysts need to collect and analyze. In this paper, we propose a formal analysis process that allows an analyst to filter the enormous amount of evidence collected and either identify crucial information about the attack (e.g., when it occurred, its culprit, its target) or, at the very least, perform a pre-analysis to reduce the complexity of the problem in order to then draw conclusions more swiftly and efficiently. We introduce the Evidence Logic EL for representing simple and derived pieces of evidence from different sources. We propose a procedure, based on monotonic reasoning, that rewrites the pieces of evidence with the use of tableau rules, based on relations of trust between sources and the reasoning behind the derived evidence, and yields a consistent set of pieces of evidence. As proof of concept, we apply our analysis process to a concrete cyber-forensics case study.

 

You can find the paper here.

This work was funded from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 746667.

AF-Cyber: Logic-based Attribution and Forensics in Cyber Security

Connected devices will continue to grow in volume and variety. The increase of connectivity brings a drastic impact on the increase of cyber attacks. Protecting measurements are not enough, while finding who did the attack is a crucial for preventing the escalation of cyber attacks. The impact of forensics in cyber security is becoming essential for the reduction and mitigation of attacks. Forensics and attribution forensics come along with their own challenges, like the difficulties on collecting suitable evidence, and the vastness of anti-forensics tools used by the attackers to cover their traces.

The main goal of AF-Cyber is to investigate and analyse the problem of attributing cyber attacks. We plan to construct a logic-based framework for performing attribution of cyber attacks, based on cyber forensics evidence, social science approaches and an intelligent methodology for dynamic evidence collection. AF-Cyber will relieve part of the cyberattacks problem, by supporting forensics investigation and attribution with logical-based frameworks representation, reasoning and supporting tools. AF-Cyber is multi-disciplinary and collaborative, bridging forensics in cyber attacks, theoretical computer science (logics and formal proofs), security, software engineering, and social science.

AF-Cyber received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 746667.

Argumentation-based Security for Social Good

The paper “Argumentation-based Security for Social Good” presented at the AAAI Spring Symposia 2017 is now available at the AAAI Technical Report.

Title: Argumentation-Based Security for Social Good

Authors: Erisa Karafili, Antonis C. Kakas, Nikolaos I. Spanoudakis, Emil C. Lupu

Abstract: The increase of connectivity and the impact it has in ever day life is raising new and existing security problems that are becoming important for social good. We introduce two particular problems: cyber attack attribution and regulatory data sharing. For both problems, decisions about which rules to apply, should be taken under incomplete and context dependent information. The solution we propose is based on argumentation reasoning, that is a well suited technique for implementing decision making mechanisms under conflicting and incomplete information. Our proposal permits us to identify the attacker of a cyber attack and decide the regulation rule that should be used while using and sharing data. We illustrate our solution through concrete examples.

The paper can be found in the following link: https://aaai.org/ocs/index.php/FSS/FSS17/paper/view/15928/15306

A video of the presentation can be found in the workshop page AI for Social Good and also in following link: https://youtu.be/wYg8jaHPbyw?t=33m33s