Adversarial Machine Learning

Machine learning has produced a disruptive change in the society bringing both economic and societal benefits across a wide range of sectors. However machine learning algorithms are vulnerable and can be an appealing target attackers, who can inject malicious data to degrade system’s performance in a targeted or an indiscriminate way when retraining the learning algorithm. Attackers can also use machine learning as a weapon to exploit the weaknesses and blind spots of the system at test time, producing intentional misbehaviour. We are interested in understanding the mechanisms that can allow a sophisticated attacker to compromise a machine learning system and to develop new defensive mechanisms to mitigate these attacks. We are also interested in the development of new design and testing methodologies for more secure machine learning systems resilient to the presence of sophisticated attackers.

Universal Adversarial Perturbations in Evasion Attacks

It is now well understood that many machine learning algorithms are vulnerable at run-time to adversarial examples, small perturbations to the input that cause misclassification or considerable variation in output. Such perturbations are most frequently calculated for each individual input separately. In contrast, we have focussed on Universal Adversarial Perturbations (UAPs), where the perturbations cause errors on a large part of the input set. Such perturbations are in our view more dangerous because they can be easily applied at scale to large parts of the input set.

Our first finding demonstrated that procedural noise such as Perlin noise acts as a UAP, and we have shown that it is possible using Bayesian Optimisation to build effective Black Box attacks that evade, image classification, image detection and image segmentation on very large numbers of inputs. (See also here for sensitivity to Gabor noise). This demonstrates a significant and systemic vulnerability of neural networks when processing images. Some papers argued that DNNs primarily rely on texture so we have investigated the adversarial robustness of both texture and shape biased models to UAPs. Our findings show that shape-biased models are equally vulnerable to UAPs, but they are simply different UAPs.

The attack surface to adversarial examples is thus very large and this raises interesting questions about the effectiveness of techniques such as adversarial training, that can only patch this attack surface with training examples. Considering the observations above about the attack surface of adversarial examples we have focussed we believe that Jacobian Regularisation offers a promising and more systematic approach to increasing robustness to UAPs. Our early results suggest that Jacobian regularisation improves can improve robustness to UAPs in a comparable or better degree than adversarial training without sacrificing clean accuracy. We are also working on the vulnerability and sensivity of compressed machine learning models to adversarial attacks.

Stealthiness and Robustness in Poisoning Attacks

Poisoning attacks occur when an adversary can manipulate a part of the data used for training and does so with the objective of degrading the performance of the machine learning algorithm either in an indiscriminate or an error specific way. Such attacks are particularly dangerous for machine learning algorithms and increasingly common as the training data is often collected from IoT devices that can be easily compromised or from a supply-chain that may contain suppliers trusted to different extents. Our work in this area has focussed on understanding the effects of poisoning attacks and increasing the robustness of the training procedures to it. We have proposed new, more effective ways of computing optimal poisoning attacks. These play a significant role in understanding what an attacker can achieve and how vulnerable the algorithms are in the worst case scenario.

Automatically generated poisoning points (red) for different levels of aggressiveness of the attack (i.e., permissible levels of detectability) as given by parameter &Alpha.

We have also shown that it is possible to train systems to automatically generate poisoning points with different levels of aggressiveness of the attack (i.e., detectability) and thus that it is possible to poison systems at scale. Finally, on the defensive side, we have shown that regularisation can help mitigate the effect of poisoning attacks provided that the hyper-parameters are adjusted to the aggressiveness of the attack. Choosing, fixed values for the hyper parameters either leads to insufficient robustness of the algorithm (when the values are too low), or damages the accuracy when there is no attack (when the values are too high). We have proposed formulations where the hyper parameters can be learnt and thus adapted to the attacks experienced.

Adversarial Aspects in Federated Machine Learning

We have worked in particular as part of the Musketeer Project on poisoning attacks in federated learning settings. Our work has shown that in such settings algorithms are highly vulnerable even when only one of the participants behaves maliciously. Federated learning settings are highly desirable because they enable the training of a model without sharing the data. However, they are also highly vulnerable and need to be protected to be applied in practice.

More content to follow soon …