AdvEdge: Optimizing Adversarial Perturbations against Interpretable Deep Learning

Eldor Abdukhamidov, Mohammed Abuhamad, Firuz Juraev, Eric Chan-Tin, Tamer Abuhmed

Research output: Contribution to journalArticlepeer-review

Abstract

Deep Neural Networks (DNNs) have achieved state-of-the-art performance in various applications. It is crucial to verify that the high accuracy prediction for a given task is derived from the correct problem representation and not from the misuse of artifacts in the data. Hence, interpretation models have become a key ingredient in developing deep learning models. Utilizing interpretation models enables a better understanding of how DNN models work, and offers a sense of security. However, interpretations are also vulnerable to malicious manipulation. We present AdvEdge and AdvEdge+" role="presentation" style="margin: 0px; box-sizing: inherit; display: inline-block; line-height: normal; word-spacing: normal; overflow-wrap: normal; text-wrap: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; position: relative;">+, two attacks to mislead the target DNNs and deceive their combined interpretation models. We evaluate the proposed attacks against two DNN model architectures coupled with four representatives of different categories of interpretation models. The experimental results demonstrate our attacks’ effectiveness in deceiving the DNN models and their interpreters.

Original languageAmerican English
JournalComputer Science: Faculty Publications and Other Works
Volume13116
DOIs
StatePublished - Dec 4 2021

Keywords

  • adversarial image
  • deep learning
  • interpretability

Disciplines

  • Computer Sciences

Cite this