Home >Technology peripherals >AI >Understanding adversarial machine learning: A comprehensive breakdown of attack and defense
Digital attacks are one of the growing threats of the digital age. In order to combat this threat, researchers have proposed adversarial machine learning technology. The goal of this technique is to trick machine learning models by using deceptive data. Adversarial machine learning involves generating and detecting adversarial examples, which are inputs created specifically to fool a classifier. In this way, an attacker can interfere with the model's output and even lead to misleading results. The research and development of adversarial machine learning is critical to protecting security in the digital age.
Adversarial examples are inputs to machine learning models. Attackers intentionally design these samples to cause the model to misclassify. Adversarial examples are small perturbations to a valid input, achieved by adding subtle changes to the input and are therefore difficult to detect. These adversarial examples look normal, but can cause the target machine learning model to misclassify.
Next, are the currently known techniques for generating adversarial examples.
1. Limited memory BFGS (L-BFGS)
Limited Memory BFGS (L-BFGS) is a nonlinear gradient-based numerical optimization algorithm that minimizes the number of perturbations added to the image.
Advantages: Effectively generate adversarial samples.
Disadvantages: It is computationally intensive because it is an optimization method with box constraints. This method is time-consuming and impractical.
2. Fast Gradient Symbol Method (FGSM)
A simple and fast gradient-based method for generating adversarial examples to Minimize the maximum amount of perturbation added to any pixel of the image, resulting in misclassification.
Advantages: Relatively efficient calculation time.
Disadvantages: Perturbation is added to each feature.
3.Deepfool attack
This untargeted adversarial sample generation technique aims to minimize the gap between the perturbed sample and the original sample. Euclidean distance. Decision boundaries between classes are estimated and perturbations are added iteratively.
Advantages: Effectively generate adversarial samples, less disturbance, higher misclassification rate.
Disadvantages: More calculations than FGSM and JSMA. Furthermore, adversarial examples may not be optimal.
4. Carlini&Wagner attack
C&W This technique is based on the L-BFGS attack, but without box constraints and a different objective function. This makes the method more effective at generating adversarial examples; it has been shown to defeat state-of-the-art defenses such as adversarial training.
Advantages: Very effective in generating adversarial examples. Additionally, it can defeat some adversarial defenses.
Disadvantages: More calculations than FGSM, JSMA, and Deepfool.
5. Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN) has been used to generate adversarial attacks, in which two neural Networks compete with each other. One acts as a generator and the other acts as a discriminator. The two networks play a zero-sum game, with the generator trying to generate samples that the discriminator will misclassify. At the same time, the discriminator tries to distinguish real samples from those created by the generator.
Advantages: Generate samples that are different from those used in training.
Disadvantages: Training a generative adversarial network requires a lot of calculations and can be very unstable.
6. Zero-Order Optimization Attack (ZOO)
ZOO technique allows estimating the gradient of a classifier without accessing the classifier, Making it ideal for black box attacks. This method estimates the gradient and hessian by querying the target model with modified individual features and uses Adam or Newton's method to optimize the perturbation.
Advantages: Similar performance to C&W attack. No training of surrogate models or information about the classifier is required.
Disadvantages: A large number of queries are required for the target classifier.
A white-box attack is a scenario where the attacker has full access to the target model, including the model’s architecture and its parameters. A black-box attack is a scenario where the attacker has no access to the model and can only observe the output of the target model.
There are many different adversarial attacks that can be used against machine learning systems. Many of them work on deep learning systems and traditional machine learning models such as support vector machines (SVM) and linear regression. Most adversarial attacks usually aim to degrade the performance of a classifier on a specific task, essentially to "fool" the machine learning algorithm. Adversarial machine learning is the field that studies a class of attacks designed to degrade the performance of a classifier on a specific task. The specific types of adversarial machine learning attacks are as follows:
1. Poisoning attack
The attacker affects the training data or its labels, Causing the model to perform poorly during deployment. Therefore, poisoning is essentially adversarial contamination of training data. Because ML systems can be retrained using data collected during operations, attackers may be able to poison the data by injecting malicious samples during operations, thereby corrupting or affecting the retraining.
2. Escape attack
Escape attack is the most common and most researched type of attack. The attacker manipulates data during deployment to fool previously trained classifiers. Since they are executed during the deployment phase, they are the most practical attack type and the most commonly used for intrusion and malware scenarios. Attackers often try to evade detection by obfuscating the content of malware or spam emails. Therefore, samples are modified to evade detection because they are classified as legitimate without directly affecting the training data. Examples of evasion are spoofing attacks against biometric verification systems.
3. Model Extraction
Model theft or model extraction involves an attacker probing a black box machine learning system in order to reconstruct a model or extract information about a trained model data. This is especially important when the training data or the model itself is sensitive and confidential. For example, model extraction attacks can be used to steal stock market prediction models, which an adversary can exploit for financial gain.
The above is the detailed content of Understanding adversarial machine learning: A comprehensive breakdown of attack and defense. For more information, please follow other related articles on the PHP Chinese website!