A Formal Characterization of the Robustness of Deep Neural Networks to Adversarial Perturbations

01 October 2018 → 30 September 2022
Research Foundation - Flanders (FWO)
Research disciplines
  • Natural sciences
    • Artificial intelligence
  • Social sciences
    • Cognitive science and intelligent systems
Neural Networks
Project description

Deep neural networks have been immensely successful in the past few years in many AI tasks,
including computer vision, speech recognition and even game playing (e.g. defeating human
champions at the game of Go). The applications of these deep learning techniques are many and
an increasing number of important technologies depends on them. Despite this, however, virtually
all deep learning models have been found to be highly sensitive to so-called adversarial
perturbations. These are small, imperceptible perturbations of the natural inputs to these models
which cause them to produce arbitrary output. Using black-box adversarial attacks, it is very easy
for a malicious actor to generate plausible inputs for a model (that would raise no red flags with
any human observers) but that cause the model to behave in an attacker-specified manner, even if
the attacker has no knowledge at all about the internals of the model in question. This poses
serious security problems that may even lead to loss of life, e.g. when a self-driving car is fooled
into thinking a stop sign is actually a speed limit sign. Much research has already gone into
devising protective measures against these attacks, but none has had lasting effectiveness. For
each new defense that is proposed, some new attack eventually bypasses it. We are in need of
robustness guarantees that provably protect our models against specific classes of adversarial
attacks, which is what I hope to achieve in this work.