Adversarial attacks and defenses in deep reinforcement learning-AI-php.cn

Home

Technology peripherals

Adversarial attacks and defenses in deep reinforcement learning

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 08, 2023 pm 11:11 PM

deep learningdefense

01 Foreword

This paper is about the work of deep reinforcement learning to resist attacks. In this paper, the author studies the robustness of deep reinforcement learning strategies to adversarial attacks from the perspective of robust optimization. Under the framework of robust optimization, the optimal adversarial attack is given by minimizing the expected return of the strategy, and accordingly, a good defense mechanism is achieved by improving the performance of the strategy in coping with the worst case scenario.

Considering that attackers are usually unable to attack in the training environment, the author proposes a greedy attack algorithm that attempts to minimize the expected return of the strategy without interacting with the environment; in addition, the author also proposes a A defense algorithm that uses max-min games to conduct adversarial training of deep reinforcement learning algorithms.

The experimental results in the Atari game environment show that the adversarial attack algorithm proposed by the author is more effective than the existing attack algorithm, and the strategy return rate is worse. The strategies generated by the adversarial defense algorithm proposed in the paper are more robust to a range of adversarial attacks than existing defense methods.

02 Preliminary knowledge

2.1 Adversarial attack

Given any sample (x, y) and neural network f, the optimization goal of generating an adversarial sample is:

Adversarial attacks and defenses in deep reinforcement learning

where is the parameter of the neural network f, L is the loss function, is the adversarial perturbation set, Adversarial attacks and defenses in deep reinforcement learning is the norm constraint ball with x as the center and as the radius. The calculation formula for generating adversarial samples through PGD attacks is as follows:

Adversarial attacks and defenses in deep reinforcement learning

where Adversarial attacks and defenses in deep reinforcement learning represents the projection operation. If the input is outside the norm sphere, the input will be projected On the sphere with x center and as the radius, represents the single-step disturbance size of the PGD attack.

2.2 Reinforcement Learning and Policy Gradients

A reinforcement learning problem can be described as a Markov decision process. The Markov decision process can be defined as a Adversarial attacks and defenses in deep reinforcement learning quintuple, where S represents a state space, A represents an action space, represents the state transition probability, and r represents The reward function represents the discount factor. The goal of strong learning learning is to learn a parameter policy distribution Adversarial attacks and defenses in deep reinforcement learning to maximize the value function

Adversarial attacks and defenses in deep reinforcement learning

where represents the initial state. Strong learning involves evaluating the action value function

Adversarial attacks and defenses in deep reinforcement learning

The above formula describes the mathematical expectation of obeying the policy after the state is executed. It can be seen from the definition that the value function and the action value function satisfy the following relationship:

Adversarial attacks and defenses in deep reinforcement learning

#For ease of expression, the author mainly focuses on the Markov process of the discrete action space, but all algorithms and results can be applied directly to successive settings.

03 Paper Method

The adversarial attack and defense of deep reinforcement learning strategies are based on the framework of robust optimization PGD

Adversarial attacks and defenses in deep reinforcement learning

where represents Adversarial attacks and defenses in deep reinforcement learning , represents the set of adversarial perturbation sequences , and for all , satisfying The above formula provides a unified framework for deep reinforcement learning to combat attacks and defenses.

On the one hand, internal minimization optimization seeks to find confrontational perturbation sequences that make the current strategy make wrong decisions. On the other hand, the purpose of external maximization is to find the strategy distribution parameters to maximize the expected return under the perturbation strategy. After the above adversarial attacks and defense games, the strategy parameters during the training process will be more resistant to adversarial attacks.

The purpose of internal minimization of the objective function is to generate adversarial perturbations, but for reinforcement learning algorithms to learn the optimal adversarial perturbations is very time-consuming and labor-intensive, and because the training environment is a threat to the attacker It is black box, so in this paper, the author considers a practical setting, that is, the attacker injects perturbations in different states. In the supervised learning attack scenario, the attacker only needs to trick the classifier model to make it misclassify and produce wrong labels; in the reinforcement learning attack scenario, the action value function provides the attacker with additional information, that is, a small behavior value will Resulting in a small expected return. Correspondingly, the author defines the optimal adversarial perturbation in deep reinforcement learning as follows

Definition 1: An optimal adversarial perturbation on state s can minimize the expected return of the state

Adversarial attacks and defenses in deep reinforcement learning

It should be noted that optimizing and solving the above formula is very tricky. It needs to ensure that the attacker can deceive the agent into choosing the worst decision-making behavior. However, for the attacker, the agent The action value function is agnostic, so there is no guarantee that it will be optimal against perturbations. The following theorem can show that if the policy is optimal, the optimal adversarial perturbation can be generated without accessing the action value function

Theorem 1: When the control strategy Adversarial attacks and defenses in deep reinforcement learning is optimal, the action value The function and strategy satisfy the following relationship

Adversarial attacks and defenses in deep reinforcement learning

where represents the policy entropy, is a state-dependent constant, and when changes to 0, it will also change to 0, Furthermore, the following formula

proves: When the random strategy Adversarial attacks and defenses in deep reinforcement learning reaches the optimal, the value function also reaches the optimal, which means that in each state s, it is impossible to find to any other behavioral distribution such that the value function increases. Correspondingly, given the optimal action value function Adversarial attacks and defenses in deep reinforcement learning , the optimal strategy can be obtained by solving the constrained optimization problem

Adversarial attacks and defenses in deep reinforcement learning

The second and third lines represents a probability distribution, and the last line represents that the strategy is a random strategy. According to the KKT condition, the above optimization problem can be transformed into the following form:

Adversarial attacks and defenses in deep reinforcement learning

in Adversarial attacks and defenses in deep reinforcement learning . Assume that is positive definite for all actions , then:

Adversarial attacks and defenses in deep reinforcement learning

When Adversarial attacks and defenses in deep reinforcement learning , then there must be , and then for any ’s , then there is , thus we will get the relationship between the action value function and the softmax of the strategy

Adversarial attacks and defenses in deep reinforcement learning

Adversarial attacks and defenses in deep reinforcement learning , and then there is

Adversarial attacks and defenses in deep reinforcement learning

Bringing the first equation above into the second equation, we have

Adversarial attacks and defenses in deep reinforcement learning

where

Adversarial attacks and defenses in deep reinforcement learning

In the above formula, Adversarial attacks and defenses in deep reinforcement learning represents a probability distribution in the form of softmax, and its entropy is equal to . When is equal to 0, also becomes 0. In this case, is greater than 0, then at this time.

Theorem 1 shows that if the strategy is optimal, the optimal perturbation can be obtained by maximizing the cross-entropy of the perturbation strategy and the original strategy. For the simplicity of discussion, the author calls the attack of Theorem 1 a strategic attack, and the author uses the PGD algorithm framework to calculate the optimal strategic attack. The specific algorithm flow chart is shown in Algorithm 1 below.

Adversarial attacks and defenses in deep reinforcement learning

The flow chart of the robust optimization algorithm for defense against perturbation proposed by the author is shown in Algorithm 2 below. This algorithm is called strategic attack adversarial training. During the training phase, the perturbation policy is used to interact with the environment, while the action value function Adversarial attacks and defenses in deep reinforcement learning of the perturbation policy is estimated to aid policy training.

The specific details are that the author first uses strategic attacks to generate perturbations during the training phase, even though the value function is not guaranteed to be reduced. In the early stages of training, the policy may not be related to the action value function. As training progresses, they will gradually satisfy the softmax relationship.

On the other hand the authors need to accurately estimate the action value function Adversarial attacks and defenses in deep reinforcement learning It is difficult to handle because the trajectories are collected by running the perturbed policy, and using these data to estimate the action value function of the unperturbed policy can be very Inaccurate.

Adversarial attacks and defenses in deep reinforcement learning

The objective function of the optimized perturbation strategy Adversarial attacks and defenses in deep reinforcement learning using PPO is

where Adversarial attacks and defenses in deep reinforcement learning , and is an estimate of the average function of the perturbation strategy. In practice, is estimated by the method GAE. The specific algorithm flow chart is shown in the figure below.

Adversarial attacks and defenses in deep reinforcement learning

04 Experimental results

The three sub-figures on the right below show the results of different attack perturbations. It can be found that both the inversely trained policy and the standard policy are resistant to random perturbations. In contrast, adversarial attacks degrade the performance of different strategies. The results depend on the test environment and defense algorithm, and further it can be found that the performance gap between the three adversarial attack algorithms is small.

In contrast, in relatively difficult settings, the strategy proposed by the paper's authors to attack algorithm interference produces much lower returns. Overall, the strategic attack algorithm proposed in the paper produces the lowest reward in most cases, indicating that it is indeed the most efficient of all tested adversarial attack algorithms.

Adversarial attacks and defenses in deep reinforcement learning

The following figure shows the learning curves of different defense algorithms and standard PPO. It is important to note that the performance curve only represents the expected return of the strategy used to interact with the environment. Among all training algorithms, ATPA proposed in the paper has the lowest training variance and is therefore more stable than other algorithms. Also notice that ATPA progresses much slower than standard PPO, especially in the early training stages. This leads to the fact that in the early stages of training, being disturbed by adverse factors can make policy training very unstable.

Adversarial attacks and defenses in deep reinforcement learning

#The table summarizes the expected returns of the strategy under different perturbations using different algorithms. It can be found that ATPA-trained strategies are resistant to various adversarial interferences. In comparison, although StageWise and DataAugment have learned to handle adversarial attacks to some extent, they are not as effective as ATPA in all cases.

Adversarial attacks and defenses in deep reinforcement learning

For a broader comparison, the authors also evaluate the robustness of these defense algorithms to varying degrees of adversarial interference generated by the most effective strategic attack algorithms. As shown below, ATPA once again received the highest scores in all cases. In addition, the evaluation variance of ATPA is much smaller than that of StageWise and DataAugment, indicating that ATPA has stronger generative ability.

Adversarial attacks and defenses in deep reinforcement learning

To achieve similar performance, ATPA requires more training data than the standard PPO algorithm. The authors delve into this issue by studying the stability of the perturbation strategy. The authors calculated the KL divergence values of the perturbation policy obtained by performing policy attacks using PGD with different random initial points in the middle and at the end of the training process. As shown in the figure below, without adversarial training, large KL divergence values are continuously observed even if the standard PPO has converged, indicating that the policy is very unstable to the perturbations produced by performing PGD with different initial points.

Adversarial attacks and defenses in deep reinforcement learning

The following figure shows the KL divergence plot of the perturbation strategy with different initial points. It can be found that each pixel in the figure represents the KL divergence value of the two perturbation strategies. , these two perturbation strategies are given by maximizing the core formula of the ATPA algorithm. Note that since KL divergence is an asymmetric metric, these mappings are also asymmetric.

Adversarial attacks and defenses in deep reinforcement learning

The above is the detailed content of Adversarial attacks and defenses in deep reinforcement learning. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Meta's New AI Assistant: Productivity Booster Or Time Sink?May 01, 2025 am 11:18 AM

Meta has joined hands with partners such as Nvidia, IBM and Dell to expand the enterprise-level deployment integration of Llama Stack. In terms of security, Meta has launched new tools such as Llama Guard 4, LlamaFirewall and CyberSecEval 4, and launched the Llama Defenders program to enhance AI security. In addition, Meta has distributed $1.5 million in Llama Impact Grants to 10 global institutions, including startups working to improve public services, health care and education. The new Meta AI application powered by Llama 4, conceived as Meta AI

80% Of Gen Zers Would Marry An AI: StudyMay 01, 2025 am 11:17 AM

Joi AI, a company pioneering human-AI interaction, has introduced the term "AI-lationships" to describe these evolving relationships. Jaime Bronstein, a relationship therapist at Joi AI, clarifies that these aren't meant to replace human c

AI Is Making The Internet's Bot Problem Worse. This $2 Billion Startup Is On The Front LinesMay 01, 2025 am 11:16 AM

Online fraud and bot attacks pose a significant challenge for businesses. Retailers fight bots hoarding products, banks battle account takeovers, and social media platforms struggle with impersonators. The rise of AI exacerbates this problem, rende

Selling To Robots: The Marketing Revolution That Will Make Or Break Your BusinessMay 01, 2025 am 11:15 AM

AI agents are poised to revolutionize marketing, potentially surpassing the impact of previous technological shifts. These agents, representing a significant advancement in generative AI, not only process information like ChatGPT but also take actio

How Computer Vision Technology Is Transforming NBA Playoff OfficiatingMay 01, 2025 am 11:14 AM

AI's Impact on Crucial NBA Game 4 Decisions Two pivotal Game 4 NBA matchups showcased the game-changing role of AI in officiating. In the first, Denver's Nikola Jokic's missed three-pointer led to a last-second alley-oop by Aaron Gordon. Sony's Haw

How AI Is Accelerating The Future Of Regenerative MedicineMay 01, 2025 am 11:13 AM

Traditionally, expanding regenerative medicine expertise globally demanded extensive travel, hands-on training, and years of mentorship. Now, AI is transforming this landscape, overcoming geographical limitations and accelerating progress through en

Key Takeaways From Intel Foundry Direct Connect 2025May 01, 2025 am 11:12 AM

Intel is working to return its manufacturing process to the leading position, while trying to attract fab semiconductor customers to make chips at its fabs. To this end, Intel must build more trust in the industry, not only to prove the competitiveness of its processes, but also to demonstrate that partners can manufacture chips in a familiar and mature workflow, consistent and highly reliable manner. Everything I hear today makes me believe Intel is moving towards this goal. The keynote speech of the new CEO Tan Libo kicked off the day. Tan Libai is straightforward and concise. He outlines several challenges in Intel’s foundry services and the measures companies have taken to address these challenges and plan a successful route for Intel’s foundry services in the future. Tan Libai talked about the process of Intel's OEM service being implemented to make customers more

AI Gone Wrong? Now There's Insurance For ThatMay 01, 2025 am 11:11 AM

Addressing the growing concerns surrounding AI risks, Chaucer Group, a global specialty reinsurance firm, and Armilla AI have joined forces to introduce a novel third-party liability (TPL) insurance product. This policy safeguards businesses against

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

4 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

InZoi: How To Apply To School And University

4 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Where to find the Site Office Key in Atomfall

4 weeks agoByDDD

Hot Tools

SublimeText3 English version

Recommended: Win version, supports code prompts!

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Dreamweaver CS6

Visual web development tools

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Hot Topics

Where is the login entrance for gmail email?

7885

1649

1410

1301

1245