We propose a black-box algorithmic framework based on an Evolutionary Strategy titled EvoSeed to
generate natural adversarial samples in an unrestricted setting.
Our results show that adversarial samples created using EvoSeed are photo-realistic and do not change the
human perception of the generated image; however, can be misclassified by various robust and non-robust
classifiers.
Abstract
Deep neural networks can be exploited using natural adversarial samples, which do not impact human perception.
Current approaches often rely on deep neural networks' white-box nature to generate these adversarial samples or
synthetically alter the distribution of adversarial samples compared to the training distribution.
In contrast, we propose EvoSeed, a novel evolutionary strategy-based algorithmic framework for generating
photo-realistic natural adversarial samples.
Our EvoSeed framework uses auxiliary Conditional Diffusion and Classifier models to operate in a black-box
setting.
We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional
Diffusion Model, results in the natural adversarial sample misclassified by the Classifier Model.
Experiments show that generated adversarial images are of high image quality, raising concerns about generating
harmful content bypassing safety classifiers.
Our research opens new avenues to understanding the limitations of current safety mechanisms and the risk of
plausible attacks against classifier systems using image generation.
EvoSeed Framework
Adversarial Images for Object Classification Task
Adversarial Images bypass Safety Checkers
Adversarial Images for Ethinicity Classification Task
Adversarial Images exploiting Misalignment
Evolution of an Adversarial Images
Bibtex
@article{kotyan2024EvoSeed,
title = {Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!,
author = {Kotyan, Shashank and Mao, Po-Yuan and Chen, Pin-Yu and Vargas, Danilo Vasconcellos},
year = {2024},
month = may,
number = {arXiv:2402.04699},
eprint = {2402.04699},
publisher = {{arXiv}},
doi = {10.48550/arXiv.2402.04699},
}