We are increasingly relying on systems that use machine learning to learn from their environment and often to detect anomalies in the behaviour that they observe. But the consequences of a malicious adversary targeting the machine learning algorithms themselves by compromising part of the data from which the system learns are poorly understood and represent a significant threat. The objective of this project is to propose systematic and realistic ways of assessing, testing and improving the robustness of machine learning algorithms to poisoning attacks. We consider both indiscriminate attacks, which aim to cause an overall degradation of the model’s performance, and targeted attacks that aim to induce specific errors. We focus in particular on “optimal” attack strategies seeking to maximise the impact of the poisoning points, thus representing a “worst-case” scenario. However, we consider sophisticated adversaries that also take into account detectability constraints.
PhD Studentship funded by DSTL