Home >Technology peripherals >AI >Exploration model based on adversarial gradient and its application in click prediction

Exploration model based on adversarial gradient and its application in click prediction

WBOY
WBOYforward
2023-04-13 23:34:011620browse

1. Abstract

Ranking models play a vital role in advertising, recommendation and search systems. In the ranking module, click-through rate estimation technology is the top priority. Currently, most click-through rate estimation technologies in the industry use deep learning algorithms to train deep neural networks based on data drive. However, the corresponding problem caused by data drive is that new projects in the recommendation system will have a cold start problem.

Exploration-Exploitation (E&E) method is usually used to deal with data cycle problems in large-scale online recommendation systems. Past research usually believed that high uncertainty in model predictions meant high potential returns, so most research literature focused on the estimation of uncertainty. For online recommendation systems using streaming training, the exploration strategy will have a greater impact on the collection of training samples, which in turn affects the further learning of the model. However, most current exploration strategies cannot well model how the explored samples affect subsequent model learning. Therefore, we designed a pseudo-exploration (Pseudo-Exploration) module to simulate the impact of samples being successfully explored and displayed on the subsequent learning of the recommendation model.

The quasi-exploration process is realized by adding adversarial perturbations to the model input. We also provide the corresponding theoretical analysis and proof of this process. Based on this, we name this method as an exploration strategy based on adversarial gradient (Adversarial Gradient driven Exploration, hereinafter referred to as AGE). In order to improve the efficiency of exploration, we also propose a dynamic gating unit to filter low-value samples to avoid wasting resources on low-value exploration. In order to verify the effectiveness of the AGE algorithm, we not only conducted a large number of experiments on public academic data sets, but also deployed the AGE model to the Alimama display advertising platform and achieved good online returns. This work has been included as a Full Paper in the KDD 2022 Research Track. You are welcome to read and communicate.

Paper: Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction

Download: https://arxiv.org/abs/2112.11136

2. Background

In advertising systems, click-through rate (CTR) estimation models are usually trained using streaming methods, and the source of streaming data is produced by the CTR model deployed online, which creates the so-called data cycle question. Because cold start and long-tail ads are not fully displayed, the CTR model lacks training data for these ads, which also leads to the model's estimation of this part of ads. There may be large errors, which will make it more difficult to display these ads, making it difficult to complete the cold start. Boot process. Specifically, Figure 1 shows the relationship between the true click-through rate of an advertisement and the number of impressions: In our system, a new advertisement needs to be displayed an average of about 10,000 times before its click-through rate can reach a convergence state. This brings a common problem to many online systems, that is, how to cold-start these advertisements while ensuring user experience.

Exploration model based on adversarial gradient and its application in click prediction

Figure 1: The relationship between advertising CTR and number of impressions

Exploration and exploitation (E&E) algorithms are usually used to solve the above problems. In recommendation or advertising systems, common methods (such as Contextual Multi-Armed Bandits, contextual multi-armed bandits) generally model this problem as follows. At each step, the system selects an action based on policy P (i.e., recommends an item _ _ to the user). In order to maximize the cumulative reward (usually measured by total clicks), the system needs to weigh whether the current emphasis is exploration or exploitation. Prior research generally considers high uncertainty to be a measure of potential returns. On the one hand, strategy P needs to give priority to projects with greater current utility to maximize the current round of benefits; on the other hand, the algorithm also needs to select operations with greater uncertainty to achieve exploration. If used to represent the strategy of weighing exploration and exploitation, the final score of the project by the system can be expressed by the following formula:

Uncertainty estimation has become the core module of many E&E algorithms. Uncertainty may originate from data variability, measurement noise, and model instability (e.g., randomness of parameters). Typical estimation methods include Monte Carlo MC-Dropout, Bayesian neural network, and prediction uncertainty. Gaussian processes, and uncertainty modeling based on gradient norms (model weights), etc. On this basis, there are two typical exploration strategies: UCB-based methods usually use the upper limit of potential returns as the final score [1, 2], while Thompson sampling-based methods are completed by sampling from the estimated probability distribution [ 3].

3. Method introduction

We believe that the above method does not consider a complete exploration closed loop. For data-driven online systems, the ultimate benefit of exploration comes from the feedback data obtained from the exploration process, and the training and updating of the model based on the feedback data. The uncertainty of the model estimate itself does not fully reflect the entire feedback loop. To this end, we introduced a quasi-exploration module to simulate the impact of feedback data on the model after completing the exploration action, and use this to measure the effectiveness of exploration. The analysis found that the effectiveness of exploration depends not only on the estimated uncertainty of the model, but also on the size of the "counter-disturbance". The so-called adversarial perturbation refers to the perturbation vector with a fixed module length added to the input of the model that causes the largest change in the model output. In the paper, we also proved that after the model is trained once using the explored data, the expectation of model output change is equivalent to adding an increment vector with a module length of uncertainty and a disturbance vector of adversarial gradient to the input vector. We verified that modeling in this way can estimate the subsequent impact of the explored samples on the model in a closed-loop manner, thereby estimating the true value of the explored samples.

We call this method Adversarial Gradient driven Exploration, or AGE for short. The AGE model consists of two parts: the Pseudo-Exploration Module and the Dynamic Gating Unit. Its overall structure is shown in Figure 2.

Exploration model based on adversarial gradient and its application in click prediction

Figure 2: AGE structure diagram

For detailed introduction of some of them, please see Section 3.1, and for some details, please see Section 3.3.

3.1 Pseudo-Exploration Module

3.1.1 Module Introduction

The main purpose of the Pseudo-Exploration Module is to quantitatively simulate the use of the model After the exploration sample is trained, the change in the score of the sample is estimated to estimate the closed-loop impact of exploration on the model. After derivation, we found that the above process can be completed through formula (2), which represents the score of the sample after exploration by the model, which we use for the final ranking.

The above formula means that we do not need to perform any operations on the original model parameters. We only need to add the product of adversarial gradient, estimated uncertainty and manually set hyperparameters to the input representation. Complete the simulation of the estimated score of the model after exploration. Among them, the calculation method of parameters and will be introduced in the next section. Later in this section, we will introduce the detailed derivation process of formula (2) in the proposed exploration module.

3.1.2 Detailed derivation

For each data sample, the training of the model will affect two parts of parameters: the representation corresponding to the sample (including products, User embedding, etc.) and model parameters. Because the goal of model parameters in training is to adapt to all samples rather than a single sample, we can think that training a single sample will mainly affect the representation of the sample, and the model parameters themselves only need minor adjustments. Therefore, in subsequent studies, we will ignore the adjustment and only focus on the changes in the representations corresponding to the samples. Assuming that the true label of the sample containing the representation is, during training, we need to find the amount of updates to minimize the loss function. Based on this, we define:

Exploration model based on adversarial gradient and its application in click prediction

# which represents the loss function used in training, and the cross-entropy loss function is generally used in CTR prediction tasks. At the same time, we use to constrain the maximum change in representation. In order to simplify writing, we will write the right side of the above formula as.

According to Lagrange's mean value theorem, when the second norm of is close to 0, we can derive the above loss function formula (3) as:

Exploration model based on adversarial gradient and its application in click prediction

We observe formula (4) and it is easy to find that the loss function obtains the minimum value when it has the opposite direction to the two vectors. In equation (3), we constrain against perturbations. Therefore, by solving equation (3), we get:

Exploration model based on adversarial gradient and its application in click prediction

In practice, we use to replace the normalized gradient in equation (5). By deriving the chain rule, it can be expanded into two parts: and . Further calculations yield:

Exploration model based on adversarial gradient and its application in click prediction

#In the above equation, we will rescale to keep the equation true. Although the meanings are different, they are all manually adjusted hyperparameters, so we can directly complete the replacement. We further simplify formula (6) as:

Exploration model based on adversarial gradient and its application in click prediction

In the above formula, the normalized gradient represents the direction of the derivative of the model output relative to the input representation. Since real user feedback is not available at exploration time, we will use estimation uncertainty to measure the difference between the predicted score and real user feedback.

In formula (7), we find the analytical solution that can maximize the change in model prediction output under the constraints of (the derivation is the same as formula (3) to formula (5)). Furthermore, we also find that the above process of adding input representation is in the same form as adversarial perturbation (see equation (9)).

Exploration model based on adversarial gradient and its application in click prediction

Therefore, we use adversarial gradient to replace in formula (7), and name our method as adversarial gradient-based exploration algorithm.

Formula (9) shows that the most effective way to explore AGE is to add adversarial perturbations to the representation input, and use the output of the perturbed model as the sorting factor: the direction of the perturbation vector represented by the adversarial gradient as the input, and The degree of perturbation in the forecast uncertainty. Therefore, after obtaining the sum, we can use the following formula to calculate the model prediction score after exploration, which is the aforementioned formula (2).

3.2 Implementation details

In AGE, we use the MC-Dropout method to estimate uncertainty. Specifically, MC-Dropout assigns a random Mask weight to each neuron in the deep model. The specific method is shown in the following formula (11). One benefit of this method is that we can directly obtain the uncertainty without changing the original structure of the model. In actual operation, the uncertainty can be expressed by calculating the variance of dropout through the idea of ​​​​UCB, or by referring to the Thompson random sampling method by calculating the difference between the sampling and the mean, that is, formula (12) and formula (13) ).

Exploration model based on adversarial gradient and its application in click predictionExploration model based on adversarial gradient and its application in click predictionExploration model based on adversarial gradient and its application in click prediction

The normalized adversarial gradient can be calculated according to the fast gradient method (FGM) in formula (8). In order to calculate the adversarial gradient more accurately, we can further use the proximal gradient descent (PGD) method to iteratively update the gradient in multiple steps, as shown in Equation (14).

Exploration model based on adversarial gradient and its application in click prediction

3.3 Dynamic Gating Unit

In practice, we found that not all ads are worth exploring. In a general Top-K advertising system, the number of ads that can be displayed to end users is relatively small. Therefore, for ads with low click-through rates (for example, low-quality ads themselves), even if the model has high uncertainty in the prediction of these ads, the exploratory value is still very low considering the business attributes of the advertising system. . Although we can obtain a large amount of data on these advertisements through exploration, so that these advertisements can be fully trained by the model and estimate more accurately; however, the low click-through rate of these advertisements will make it impossible to obtain these advertisements by themselves even after full exploration. Traffic, such exploration is undoubtedly inefficient. In this paper, we tried a simple heuristic to make exploration more efficient - if the model's estimated score for the ad is higher than the average click-through rate for the ad across all groups, we will explore; otherwise, Exploration will not happen.

In order to calculate the average click-through rate of ads, we introduced the Dynamic Gating Threshold Unit (DGU) module. DGU uses only ad-side features as input to estimate the average click-through rate of ads. When the model's estimated click-through rate is lower than the average ad click-through rate estimated by the DGU module, no exploration will be performed. Otherwise, normal exploration will be performed. The process is shown in the following formula:

Exploration model based on adversarial gradient and its application in click prediction

Finally, we will substitute it into formula (10) to obtain the final and complete calculation method of the AGE exploration model as follows.

4. Experimental evaluation

4.1 Offline experiment

We compared three major categories of baseline methods, including exploration methods based on random sampling, exploration methods based on deep models, and gradient-based Exploration method, the results are shown in Table 1. It can be observed that the baseline models built based on the Thompson Sampling (TS) method are better than the models based on UCB, proving that TS is a better algorithm for measuring model uncertainty. Furthermore, we can observe that the AGE algorithm outperforms all baseline methods, which also proves the effectiveness of the AGE method. Specifically, both AGE-TS and AGE-UCB outperform the best baselines UR-gradient-TS and UR-gradient-UCB [4], with improvement values ​​of 5.41% and 15.3% respectively. The AGE-TS method increases clicks by a full 28.0% compared to the baseline method without exploration. It is worth noting that the AGE-based UCB and TS algorithms AGE-UCB and AGE-TS achieve similar results, which is not the case for the gradient-based UCB and TS algorithms, which also proves that AGE can compensate for the instability of the UCB method.

Exploration model based on adversarial gradient and its application in click prediction

Table 1: Offline experimental results

We also conducted a large number of ablation experiments to prove the effectiveness of each module. As shown in Table 2, the threshold unit, adversarial gradient, and uncertainty unit are all indispensable. In order to further determine the effect of DGU, we tried different fixed threshold parameters, and finally found that their effects were not as good as the dynamic threshold of DGU.

Exploration model based on adversarial gradient and its application in click prediction

Table 2: Ablation experiment results

4.2 Online experiment

We also deployed the AGE model into the Alimama display advertising system. In order to accurately evaluate the exploration value of the model, we designed an evaluation method based on fair buckets. As shown in Figure 3, we first set up bucket C and bucket D for data collection. In bucket D, we deploy exploration algorithms such as AGE, while in bucket C, we adopt the conventional CTR model without exploration. After a period of time, we apply the feedback data obtained from bucket C and bucket D to the training of models deployed on fair buckets A and B respectively. Finally, we will compare the model performance on fair buckets A and B. In online experiments, we use several standard indicators for evaluation, including click-through rate (CTR), number of displayed ads PV and PCOC, the ratio of predicted CTR to real CTR. Additionally, we introduced an operational metric (AFR) to measure advertiser satisfaction.

Exploration model based on adversarial gradient and its application in click prediction

Figure 3: Fair Bucket Experiment Plan

As shown in Table 3, the above indicators have been effectively improved. Among them, AGE significantly outperforms all other methods: CTR and PV are 6.4% and 3.0% higher than the baseline model, respectively. At the same time, the use of the AGE model also improves the prediction accuracy of the model, that is, the prediction accuracy PCOC is closer to 1. More importantly, the AFR indicator also increased by 5.5%, which shows that our exploration method can effectively improve the advertiser experience.

Exploration model based on adversarial gradient and its application in click prediction

Table 3: Online Experiment Results

5. Summary

Different from most exploration & exploitation methods that focus on estimating potential returns , our approach AGE reframes this problem from a data-driven perspective of online learning. In addition to estimating the uncertainty of the current model prediction, the AGE algorithm uses the quasi-exploration module to further consider the subsequent impact of exploration samples on model training. We conducted A/B testing experiments on both academic research data sets and production links, and the relevant results confirmed the effectiveness of the AGE method. In the future, we will deploy AGE in more application scenarios.

The above is the detailed content of Exploration model based on adversarial gradient and its application in click prediction. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete