Home  >  Article  >  Technology peripherals  >  Adversarial attacks and defenses in deep reinforcement learning

Adversarial attacks and defenses in deep reinforcement learning

WBOY
WBOYforward
2023-04-08 23:11:071271browse

01 Foreword

This paper is about the work of deep reinforcement learning to resist attacks. In this paper, the author studies the robustness of deep reinforcement learning strategies to adversarial attacks from the perspective of robust optimization. Under the framework of robust optimization, the optimal adversarial attack is given by minimizing the expected return of the strategy, and accordingly, a good defense mechanism is achieved by improving the performance of the strategy in coping with the worst case scenario.

Considering that attackers are usually unable to attack in the training environment, the author proposes a greedy attack algorithm that attempts to minimize the expected return of the strategy without interacting with the environment; in addition, the author also proposes a A defense algorithm that uses max-min games to conduct adversarial training of deep reinforcement learning algorithms.

The experimental results in the Atari game environment show that the adversarial attack algorithm proposed by the author is more effective than the existing attack algorithm, and the strategy return rate is worse. The strategies generated by the adversarial defense algorithm proposed in the paper are more robust to a range of adversarial attacks than existing defense methods.

02 Preliminary knowledge

2.1 Adversarial attack

Given any sample (x, y) and neural network f, the optimization goal of generating an adversarial sample is:

Adversarial attacks and defenses in deep reinforcement learning

where is the parameter of the neural network f, L is the loss function, is the adversarial perturbation set, Adversarial attacks and defenses in deep reinforcement learning is the norm constraint ball with x as the center and as the radius. The calculation formula for generating adversarial samples through PGD attacks is as follows:

Adversarial attacks and defenses in deep reinforcement learning

where Adversarial attacks and defenses in deep reinforcement learning represents the projection operation. If the input is outside the norm sphere, the input will be projected On the sphere with x center and as the radius, represents the single-step disturbance size of the PGD attack.

2.2 Reinforcement Learning and Policy Gradients

A reinforcement learning problem can be described as a Markov decision process. The Markov decision process can be defined as a Adversarial attacks and defenses in deep reinforcement learning quintuple, where S represents a state space, A represents an action space, Adversarial attacks and defenses in deep reinforcement learning represents the state transition probability, and r represents The reward function represents the discount factor. The goal of strong learning learning is to learn a parameter policy distributionAdversarial attacks and defenses in deep reinforcement learning to maximize the value function

Adversarial attacks and defenses in deep reinforcement learning

where represents the initial state. Strong learning involves evaluating the action value function

Adversarial attacks and defenses in deep reinforcement learning

The above formula describes the mathematical expectation of obeying the policy after the state is executed. It can be seen from the definition that the value function and the action value function satisfy the following relationship:

Adversarial attacks and defenses in deep reinforcement learning

#For ease of expression, the author mainly focuses on the Markov process of the discrete action space, but all algorithms and results can be applied directly to successive settings.

03 Paper Method

The adversarial attack and defense of deep reinforcement learning strategies are based on the framework of robust optimization PGD

Adversarial attacks and defenses in deep reinforcement learning

where represents Adversarial attacks and defenses in deep reinforcement learning, represents the set of adversarial perturbation sequences Adversarial attacks and defenses in deep reinforcement learning, and for all Adversarial attacks and defenses in deep reinforcement learning, satisfying Adversarial attacks and defenses in deep reinforcement learning The above formula provides a unified framework for deep reinforcement learning to combat attacks and defenses.

On the one hand, internal minimization optimization seeks to find confrontational perturbation sequences that make the current strategy make wrong decisions. On the other hand, the purpose of external maximization is to find the strategy distribution parameters to maximize the expected return under the perturbation strategy. After the above adversarial attacks and defense games, the strategy parameters during the training process will be more resistant to adversarial attacks.

The purpose of internal minimization of the objective function is to generate adversarial perturbations, but for reinforcement learning algorithms to learn the optimal adversarial perturbations is very time-consuming and labor-intensive, and because the training environment is a threat to the attacker It is black box, so in this paper, the author considers a practical setting, that is, the attacker injects perturbations in different states. In the supervised learning attack scenario, the attacker only needs to trick the classifier model to make it misclassify and produce wrong labels; in the reinforcement learning attack scenario, the action value function provides the attacker with additional information, that is, a small behavior value will Resulting in a small expected return. Correspondingly, the author defines the optimal adversarial perturbation in deep reinforcement learning as follows

Definition 1: An optimal adversarial perturbation on state s can minimize the expected return of the state

Adversarial attacks and defenses in deep reinforcement learning

It should be noted that optimizing and solving the above formula is very tricky. It needs to ensure that the attacker can deceive the agent into choosing the worst decision-making behavior. However, for the attacker, the agent The action value function is agnostic, so there is no guarantee that it will be optimal against perturbations. The following theorem can show that if the policy is optimal, the optimal adversarial perturbation can be generated without accessing the action value function

Theorem 1: When the control strategy Adversarial attacks and defenses in deep reinforcement learning is optimal, the action value The function and strategy satisfy the following relationship

Adversarial attacks and defenses in deep reinforcement learning

where represents the policy entropy, is a state-dependent constant, and when changes to 0, it will also change to 0, Furthermore, the following formula

proves: When the random strategy Adversarial attacks and defenses in deep reinforcement learning reaches the optimal, the value function Adversarial attacks and defenses in deep reinforcement learning also reaches the optimal, which means that in each state s, it is impossible to find to any other behavioral distribution such that the value function Adversarial attacks and defenses in deep reinforcement learning increases. Correspondingly, given the optimal action value functionAdversarial attacks and defenses in deep reinforcement learning, the optimal strategy can be obtained by solving the constrained optimization problemAdversarial attacks and defenses in deep reinforcement learning

Adversarial attacks and defenses in deep reinforcement learning

The second and third lines represents a probability distribution, and the last line represents that the strategy is a random strategy. According to the KKT condition, the above optimization problem can be transformed into the following form:

Adversarial attacks and defenses in deep reinforcement learning

Adversarial attacks and defenses in deep reinforcement learning

inAdversarial attacks and defenses in deep reinforcement learning. Assume that Adversarial attacks and defenses in deep reinforcement learning is positive definite for all actions Adversarial attacks and defenses in deep reinforcement learning, then:

Adversarial attacks and defenses in deep reinforcement learning

When Adversarial attacks and defenses in deep reinforcement learning, then there must be Adversarial attacks and defenses in deep reinforcement learning, and then for any ’s Adversarial attacks and defenses in deep reinforcement learning, then there is Adversarial attacks and defenses in deep reinforcement learning, thus we will get the relationship between the action value function and the softmax of the strategy

Adversarial attacks and defenses in deep reinforcement learning

Adversarial attacks and defenses in deep reinforcement learning, and then there is

Adversarial attacks and defenses in deep reinforcement learning

Bringing the first equation above into the second equation, we have

Adversarial attacks and defenses in deep reinforcement learning

where

Adversarial attacks and defenses in deep reinforcement learning

In the above formula, Adversarial attacks and defenses in deep reinforcement learning represents a probability distribution in the form of softmax, and its entropy is equal to . When is equal to 0, also becomes 0. In this case, is greater than 0, then Adversarial attacks and defenses in deep reinforcement learning at this time.

Theorem 1 shows that if the strategy is optimal, the optimal perturbation can be obtained by maximizing the cross-entropy of the perturbation strategy and the original strategy. For the simplicity of discussion, the author calls the attack of Theorem 1 a strategic attack, and the author uses the PGD algorithm framework to calculate the optimal strategic attack. The specific algorithm flow chart is shown in Algorithm 1 below.

Adversarial attacks and defenses in deep reinforcement learning

The flow chart of the robust optimization algorithm for defense against perturbation proposed by the author is shown in Algorithm 2 below. This algorithm is called strategic attack adversarial training. During the training phase, the perturbation policy is used to interact with the environment, while the action value function Adversarial attacks and defenses in deep reinforcement learning of the perturbation policy is estimated to aid policy training.

The specific details are that the author first uses strategic attacks to generate perturbations during the training phase, even though the value function is not guaranteed to be reduced. In the early stages of training, the policy may not be related to the action value function. As training progresses, they will gradually satisfy the softmax relationship.

On the other hand the authors need to accurately estimate the action value functionAdversarial attacks and defenses in deep reinforcement learningIt is difficult to handle because the trajectories are collected by running the perturbed policy, and using these data to estimate the action value function of the unperturbed policy can be very Inaccurate.

Adversarial attacks and defenses in deep reinforcement learning

The objective function of the optimized perturbation strategy Adversarial attacks and defenses in deep reinforcement learning using PPO is

where Adversarial attacks and defenses in deep reinforcement learning, and Adversarial attacks and defenses in deep reinforcement learning is an estimate of the average function Adversarial attacks and defenses in deep reinforcement learning of the perturbation strategy. In practice, Adversarial attacks and defenses in deep reinforcement learning is estimated by the method GAE. The specific algorithm flow chart is shown in the figure below.

Adversarial attacks and defenses in deep reinforcement learning

04 Experimental results

The three sub-figures on the right below show the results of different attack perturbations. It can be found that both the inversely trained policy and the standard policy are resistant to random perturbations. In contrast, adversarial attacks degrade the performance of different strategies. The results depend on the test environment and defense algorithm, and further it can be found that the performance gap between the three adversarial attack algorithms is small.

In contrast, in relatively difficult settings, the strategy proposed by the paper's authors to attack algorithm interference produces much lower returns. Overall, the strategic attack algorithm proposed in the paper produces the lowest reward in most cases, indicating that it is indeed the most efficient of all tested adversarial attack algorithms.

Adversarial attacks and defenses in deep reinforcement learning

The following figure shows the learning curves of different defense algorithms and standard PPO. It is important to note that the performance curve only represents the expected return of the strategy used to interact with the environment. Among all training algorithms, ATPA proposed in the paper has the lowest training variance and is therefore more stable than other algorithms. Also notice that ATPA progresses much slower than standard PPO, especially in the early training stages. This leads to the fact that in the early stages of training, being disturbed by adverse factors can make policy training very unstable.

Adversarial attacks and defenses in deep reinforcement learning

#The table summarizes the expected returns of the strategy under different perturbations using different algorithms. It can be found that ATPA-trained strategies are resistant to various adversarial interferences. In comparison, although StageWise and DataAugment have learned to handle adversarial attacks to some extent, they are not as effective as ATPA in all cases.

Adversarial attacks and defenses in deep reinforcement learning

For a broader comparison, the authors also evaluate the robustness of these defense algorithms to varying degrees of adversarial interference generated by the most effective strategic attack algorithms. As shown below, ATPA once again received the highest scores in all cases. In addition, the evaluation variance of ATPA is much smaller than that of StageWise and DataAugment, indicating that ATPA has stronger generative ability.

Adversarial attacks and defenses in deep reinforcement learning

To achieve similar performance, ATPA requires more training data than the standard PPO algorithm. The authors delve into this issue by studying the stability of the perturbation strategy. The authors calculated the KL divergence values ​​of the perturbation policy obtained by performing policy attacks using PGD with different random initial points in the middle and at the end of the training process. As shown in the figure below, without adversarial training, large KL divergence values ​​are continuously observed even if the standard PPO has converged, indicating that the policy is very unstable to the perturbations produced by performing PGD with different initial points.

Adversarial attacks and defenses in deep reinforcement learning

The following figure shows the KL divergence plot of the perturbation strategy with different initial points. It can be found that each pixel in the figure represents the KL divergence value of the two perturbation strategies. , these two perturbation strategies are given by maximizing the core formula of the ATPA algorithm. Note that since KL divergence is an asymmetric metric, these mappings are also asymmetric.

Adversarial attacks and defenses in deep reinforcement learning

The above is the detailed content of Adversarial attacks and defenses in deep reinforcement learning. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete