adversarial machine learning

Indiscriminate data poisoning against supervised learning: general attack formulations, robust defences, and poisonability

Javier Carnerero Cano

Abstract

Machine learning (ML) systems often rely on data collected from untrusted sources, such as humans or sensors, that can be compromised. These scenarios expose ML algorithms to data poisoning attacks, where adversaries manipulate a fraction of the training data to degrade the ML system performance. However, previous works lack a systematic evaluation of attacks considering the ML pipeline, and focus on classification settings. This is concerning since regression models are also applied in safety-critical systems. We characterise indiscriminate data poisoning attacks and defences in worst-case scenarios against supervised learning algorithms, considering the ML pipeline: data sanitisation, hyperparameter learning, and training. We propose a novel attack formulation that considers the effect of the attack on the model’s hyperparameters. We apply this attack formulation to several ML classifiers using L2 and L1 regularisation. Our evaluation shows the benefits of using regularisation to help mitigate poisoning attacks, when hyperparameters are learnt using a trusted dataset. We then introduce a threat model for poisoning attacks against regression models, and propose a novel stealthy attack formulation via multiobjective bilevel optimisation, where the two objectives are attack effectiveness and detectability. We experimentally show that state-of-the-art defences do not mitigate these stealthy attacks. Furthermore, we theoretically justify the detectability objective and methodology designed. We also propose a novel defence, built upon Bayesian linear regression, that rejects points based on the model’s predictive variance. We empirically show its effectiveness to mitigate stealthy attacks and attacks with a large fraction of poisoning points. Finally, we introduce the concept of “poisonability”, which allows us to find the number of poisoning points required so that the mean error of the clean points matches the mean error of the poisoning points on the poisoned model. This challenges the underlying assumption of most defences. Specifically, we determine the poisonability of linear regression.

Download

Understanding and mitigating universal adversarial perturbations for computer vision neural networks

Kenneth Tan Co

Deep neural networks (DNNs) have become the algorithm of choice for many computer vision applications. They are able to achieve human level performance in many computer vision tasks, and enable the automation and large-scale deployment of applications such as object tracking, autonomous vehicles, and medical imaging. However, DNNs expose software applications to systemic vulnerabilities in the form of Universal Adversarial Perturbations (UAPs): input perturbation attacks that can cause DNNs to make classification errors on large sets of inputs.

Our aim is to improve the robustness of computer vision DNNs to UAPs without sacrificing the models’ predictive performance. To this end, we increase our understanding of these vulnerabilities by investigating the visual structures and patterns commonly appearing in UAPs. We demonstrate the efficacy and pervasiveness of UAPs by showing how Procedural Noise patterns can be used to generate efficient zero-knowledge attacks for different computer vision models and tasks at minimal cost to the attacker. We then evaluate the UAP robustness of various shape and texture-biased models, and found that applying them in ensembles provides marginal improvement to robustness.

To mitigate UAP attacks, we develop two novel approaches. First, we propose the Jacobian of DNNs to measure the sensitivity of computer vision DNNs. We derive theoretical bounds and provide empirical evidence that shows how a combination of Jacobian regularisation and ensemble methods allow for increased model robustness against UAPs without degrading the predictive performance of computer vision DNNs. Our results evince a robustness-accuracy trade-off against UAPs that is better than those of models trained in conventional ways. Finally, we design a detection method that analyses the hidden layer activation values to identify a variety of UAP attacks in real-time with low-latency. We show that our work outperforms existing defences under realistic time and computation constraints.

Thesis

Cite as: Co, Kenneth Tan, Understanding and mitigating universal adversarial perturbations for computer vision neural networks. PhD Thesis, Department of Computing, Imperial College London, https://doi.org/10.25560/103574, March 2023

Robustness and Transferability of Universal Attacks on Compressed Models

A.G. Matachana, K.T. Co, L. Muñoz-González, D. Martinez, E.C. Lupu. Robustness and Transferability of Universal Attacks on Compressed Models. AAAI 2021 Workshop: Towards Robust, Secure and Efficient Machine Learning. 2021.

Neural network compression methods like pruning and quantization are very effective at efficiently deploying Deep Neural Networks (DNNs) on edge devices. However, DNNs remain vulnerable to adversarial examples-inconspicuous inputs that are specifically designed to fool these models. In particular, Universal Adversarial Perturbations (UAPs), are a powerful class of adversarial attacks which create adversarial perturbations that can generalize across a large set of inputs. In this work, we analyze the effect of various compression techniques to UAP attacks, including different forms of pruning and quantization. We test the robustness of compressed models to white-box and transfer attacks, comparing them with their uncompressed counterparts on CIFAR-10 and SVHN datasets. Our evaluations reveal clear differences between pruning methods, including Soft Filter and Post-training Pruning. We observe that UAP transfer attacks between pruned and full models are limited, suggesting that the systemic vulnerabilities across these models are different. This finding has practical implications as using different compression techniques can blunt the effectiveness of black-box transfer attacks. We show that, in some scenarios, quantization can produce gradient-masking, giving a false sense of security. Finally, our results suggest that conclusions about the robustness of compressed models to UAP attacks is application dependent, observing different phenomena in the two datasets used in our experiments.

Pre-Print

Slides

Object Removal Attacks on LiDAR-based 3D Object Detectors

LiDARs play a critical role in Autonomous Vehicles’ (AVs) perception and their safe operations. Recent works have demonstrated that it is possible to spoof LiDAR return signals to elicit fake objects. In this work we demonstrate how the same physical capabilities can be used to mount a new, even more dangerous class of attacks, namely Object Removal Attacks (ORAs). ORAs aim to force 3D object detectors to fail. We leverage the default setting of LiDARs that record a single return signal per direction to perturb point clouds in the region of interest (RoI) of 3D objects. By injecting illegitimate points behind the target object, we effectively shift points away from the target objects’ RoIs. Our initial results using a simple random point selection strategy show that the attack is effective in degrading the performance of commonly used 3D object detection models.

Z. Hau, K.T. Co, S. Demetriou, E.C. Lupu. Object Removal Attacks on LiDAR-based 3D Object Detectors. Automotive and Autonomous Vehicle Security (AutoSec) Workshop @ NDSS Symposium 2021.

Paper

Jacobian Regularization for Mitigating Universal Adversarial Perturbations

Authors: Kenneth Co, David Martinez Rego, Emil Lupu

Universal Adversarial Perturbations (UAPs) are input perturbations that can fool a neural network on large sets of data. They are a class of attacks that represents a significant threat as they facilitate realistic, practical, and low-cost attacks on neural networks. In this work, we derive upper bounds for the effectiveness of UAPs based on norms of data-dependent Jacobians. We empirically verify that Jacobian regularization greatly increases model robustness to UAPs by up to four times whilst maintaining clean performance. Our theoretical analysis also allows us to formulate a metric for the strength of shared adversarial perturbations between pairs of inputs. We apply this metric to benchmark datasets and show that it is highly correlated with the actual observed robustness. This suggests that realistic and practical universal attacks can be reliably mitigated without sacrificing clean accuracy, which shows promise for the robustness of machine learning systems.

Kenneth Co, David Martinez Rego, Emil Lupu, Jacobian Regularization for Mitigating Universal Adversarial Perturbations. 30th International Conference on Artificial Neural Networks (ICANN 21), Sept. 2021.

Pre-print on arxiv

Universal Adversarial Robustness of Texture and Shape-Biased Models

Increasing shape-bias in deep neural networks has been shown to improve robustness to common corruptions and noise. In this paper we analyze the adversarial robustness of texture and shape-biased models to Universal Adversarial Perturbations (UAPs). We use UAPs to evaluate the robustness of DNN models with varying degrees of shape-based training. We find that shape-biased models do not markedly improve adversarial robustness, and we show that ensembles of texture and shape-biased models can improve universal adversarial robustness while maintaining strong performance.

Citation: K. T. Co, L. Muñoz-González, L. Kanthan, B. Glocker and E. C. Lupu, “Universal Adversarial Robustness of Texture and Shape-Biased Models,” 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 799-803, doi: 10.1109/ICIP42928.2021.9506325.

Paper in IEEE Archive Pre-print on arxiv

Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks (CCS ’19)

Kenneth T. Co, Luis Muñoz-González, Sixte de Maupeou, and Emil C. Lupu. 2019. Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS ’19). Association for Computing Machinery, New York, NY, USA, 275–289. DOI:https://doi.org/10.1145/3319535.3345660

Abstract: Deep Convolutional Networks (DCNs) have been shown to be vulnerable to adversarial examples—perturbed inputs specifically designed to produce intentional errors in the learning algorithms at test time. Existing input-agnostic adversarial perturbations exhibit interesting visual patterns that are currently unexplained. In this paper, we introduce a structured approach for generating Universal Adversarial Perturbations (UAPs) with procedural noise functions. Our approach unveils the systemic vulnerability of popular DCN models like Inception v3 and YOLO v3, with single noise patterns able to fool a model on up to 90% of the dataset. Procedural noise allows us to generate a distribution of UAPs with high universal evasion rates using only a few parameters. Additionally, we propose Bayesian optimization to efficiently learn procedural noise parameters to construct inexpensive untargeted black-box attacks. We demonstrate that it can achieve an average of less than 10 queries per successful attack, a 100-fold improvement on existing methods. We further motivate the use of input-agnostic defences to increase the stability of models to adversarial perturbations. The universality of our attacks suggests that DCN models may be sensitive to aggregations of low-level class-agnostic features. These findings give insight on the nature of some universal adversarial perturbations and how they could be generated in other applications.

Paper (publisher)

Code

Code Release: “Towards Poisoning of Deep Learning Algorithms with Back-gradient optimization”

We have released the code with a demo or our poisoning attack described in the paper “Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization.”

You can access the code in this link.

MUSKETEER: Machine learning to augment shared knowledge in federated privacy-preserving scenarios

The massive increase in data collected and stored worldwide calls for new ways to preserve privacy while still allowing data sharing among multiple data owners. Today, the lack of trusted and secure environments for data sharing inhibits data economy while legality, privacy, trustworthiness, data value and confidentiality hamper the free flow of data. By the end of the project, MUSKETEER aims to create a validated, federated, privacy-preserving machine learning platform tested on industrial data that is inter-operable, scalable and efficient enough to be deployed in real use cases. MUSKETEER aims to alleviate data sharing barriers by providing secure, scalable and privacy-preserving analytics over decentralized datasets using machine learning. Data can continue to be stored in different locations with different privacy constraints, but shared securely. The MUSKETEER cross-domain platform will validate progress in the industrial scenarios of smart manufacturing and health. MUSKETEER strives to (1) create machine learning models over a variety of privacy-preserving scenarios, (2) ensure security and robustness against external and internal threats, (3) provide a standardized and extendable architecture, (4) demonstrate and validate in two different industrial scenarios and (5) enhance data economy by boosting sharing across domains. The MUSKETEER impact crosses industrial, scientific, economic and strategic domains. Real-world industry requirements and outcomes are validated in an operational setting. Federated machine learning approaches for data sharing are innovated. Data economy is fostered by creating a rewarding model capable of fairly monetizing datasets according to the real data value. Finally, Europe is positioned as a leader in innovative data sharing technologies.

Project Introduction

Visit Musketeer Site

Musketeer on Twitter

Musketeer on LinkedIn

Project Fact Sheet

CONCORDIA: Cyber Security Competence Network for Research and Innovation (Concordia)

A Cybersecurity Competence Network with leading research, technology, industrial and public competences. CONCORDIA provides excellence and leadership in technology, processes and services to establish an user-centric EU-integrated cyber security ecosystem for digital sovereignty in Europe.

Label Sanitization against Label Flipping Poisoning Attacks

Andrea Paudice, Luis Muñoz-González, Emil C. Lupu. 2018. Label Sanitization against Label Flipping Poisoning Attacks. arXiv preprint arXiv:1803.00992.

Many machine learning systems rely on data collected in the wild from untrusted sources, exposing the learning algorithms to data poisoning. Attackers can inject malicious data in the training dataset to subvert the learning process, compromising the performance of the algorithm producing errors in a targeted or an indiscriminate way. Label flipping attacks are a special case of data poisoning, where the attacker can control the labels assigned to a fraction of the training points. Even if the capabilities of the attacker are constrained, these attacks have been shown to be effective to significantly degrade the performance of the system. In this paper we propose an efficient algorithm to perform optimal label flipping poisoning attacks and a mechanism to detect and relabel suspicious data points, mitigating the effect of such poisoning attacks.

Detection of Adversarial Training Examples in Poisoning Attacks through Anomaly Detection

Andrea Paudice, Luis Muñoz-González, Andras Gyorgy, Emil C. Lupu. 2018. Detection of Adversarial Training Examples in Poisoning Attacks through Anomaly Det ection. arXiv preprint arXiv:1802.03041.

Machine learning has become an important component for many systems and applications including computer vision, spam filtering, malware and network intrusion detection, among others. Despite the capabilities of machine learning algorithms to extract valuable information from data and produce accurate predictions, it has been shown that these algorithms are vulnerable to attacks.

Data poisoning is one of the most relevant security threats against machine learning systems, where attackers can subvert the learning process by injecting malicious samples in the training data. Recent work in adversarial machine learning has shown that the so-called optimal attack strategies can successfully poison linear classifiers, degrading the performance of the system dramatically after compromising a small fraction of the training dataset. In this paper we propose a defence mechanism to mitigate the effect of these optimal poisoning attacks based on outlier detection. We show empirically that the adversarial examples generated by these attack strategies are quite different from genuine points, as no detectability constrains are considered to craft the attack. Hence, they can be detected with an appropriate pre-filtering of the training dataset.

Towards Poisoning Deep Learning Algorithms with Back-gradient Optimization

Luis Muñoz-González, Battista Biggio, Ambra Demontis, Andrea Paudice, Vasin Wongrassamee, Emil C. Lupu, Fabio Roli. “Towards Poisoning Deep Learning Algorithms with Back-gradient Optimization.” Workshop on Artificial Intelligence and Security (AISec), 2017.

A number of online services nowadays rely upon machine learning to extract valuable information from data collected in the wild. This exposes learning algorithms to the threat of data poisoning, i.e., a coordinate attack in which a fraction of the training data is controlled by the attacker and manipulated to subvert the learning process. To date, these attacks have been devised only against a limited class of binary learning algorithms, due to the inherent complexity of the gradient-based procedure used to optimize the poisoning points (a.k.a. adversarial training examples).[/ezcol_2third]

In this work, we first extend the definition of poisoning attacks to multi-class problems. We then propose a novel poisoning algorithm based on the idea of back-gradient optimization, i.e., to compute the gradient of interest through automatic differentiation, while also reversing the learning procedure to drastically reduce the attack complexity. Compared to current poisoning strategies, our approach is able to target a wider class of learning algorithms, trained with gradient-based procedures, including neural networks and deep learning architectures. We empirically evaluate its effectiveness on several application examples, including spam filtering, malware detection, and handwritten digit recognition. We finally show that, similarly to adversarial test examples, adversarial training examples can also be transferred across different learning algorithms.

This work has been done in collaboration with the PRA Lab in the University of Cagliari, Italy.