Home > Article > Technology peripherals > Application of cause-and-effect correction method in Ant Marketing recommendation scenario
In the recommendation system, data is collected to Train recommendation models to recommend appropriate items to users. When users interact with recommended items, the data collected is used to further train the model, forming a closed loop. However, there may be various influencing factors in this closed loop, resulting in errors. The main reason for the error is that most of the data used to train the model is observation data rather than ideal training data, which is affected by factors such as exposure strategy and user selection. The essence of this bias lies in the difference between the expectations of empirical risk estimates and the expectations of true ideal risk estimates.
The more common deviations in the recommended marketing system mainly include the following three Species:
#There are other deviations, such as position deviation, consistency deviation, etc.
Let’s use an example to understand the deviation. Impact on the modeling process. Suppose we want to study the relationship between coffee drinking and heart disease. We found that coffee drinkers are more likely to develop heart disease, so it may be concluded that there is a direct causal relationship between coffee drinking and heart disease. However, we need to be aware of the presence of confounding factors. For example, suppose coffee drinkers are more likely to also be smokers. Smoking itself is linked to heart disease. Therefore, we cannot simply attribute the relationship between coffee drinking and heart disease to causation, but may be due to the presence of smoking as a confounding factor. To more accurately study the relationship between coffee drinking and heart disease, we need to control for the effects of smoking. One approach is to conduct matched studies, in which smokers are paired with non-smokers and then compared for the relationship between coffee drinking and heart disease. This eliminates the confounding effect of smoking on the results. Causality is a question of what if, that is, if other conditions remain unchanged, will a change in coffee drinking lead to a change in the occurrence of heart disease. Only after controlling for the effects of confounding factors can we more accurately determine whether there is a causal relationship between coffee drinking and heart disease.
How to avoid this problem? One of the more common methods is to introduce unbiased data and use unbiased data to help the model learn unbiased representations; the other is The method is to start from the perspective of a cause-and-effect diagram and correct the deviation by adjusting the observation data later. Causal correction is to process data or models through causal means to remove the influence of bias.
The cause-and-effect diagram is a directed and undirected diagram. Ring diagram is used to describe the causal relationship between nodes in the scene. It consists of chain structure, bifurcated structure and collision structure.
You can refer to the example above to determine the backdoor path and backdoor criteria. The backdoor path refers to the path that links X to Y, but starts from Z and eventually points to Y. Similar to the previous example, the relationship between COVID-19 infection and mortality is not purely causal. Infection with COVID-19 is affected by age. Older people are more likely to be infected with COVID-19, and their mortality rate is also higher. However, if we have enough data to block all backdoor paths between X and Y, that is, given Z, then X and Y can be modeled as independent relationships, so that we can get the real Causal relationship.
The following introduces the work of the Ant team based on data fusion correction, which has been published on the Industry Track of SIGIR2023. The idea of the work is to use unbiased data to do data augmentation and guide the correction of the model.
The overall distribution of unbiased data is different from that of biased data. Biased data will be concentrated in a certain part of the entire sample space, and the missing data The samples will be concentrated in some areas with relatively less biased data, so if the augmented samples are close to areas with more unbiased areas, the unbiased data will play a more important role; if the augmented samples are close to areas with biased data Region, then biased data will play more roles. For this purpose, this paper designs an MDI model that can better utilize unbiased and biased data for data augmentation.
The above figure shows the framework diagram of the algorithm. The MDI model uses meta-learning to adjust the weight of samples on unbiased data and weighted coefficient. First of all, there are two stages in MDI model training:
Train the fusion debiasing model fd by optimizing the operating loss of L(fd). The final Loss loss mainly has two items, one is L-IPS and the other is L -IMP. L-IPS is an IPS module that we use for optimization using original samples; R-UI uses any model to derive the propensity score (determining the probability that a sample belongs to an unbiased sample or the probability that it belongs to a biased sample); the second item L-IMP is the weight of the preset augmentation module, R-UI is the tail index generated by the preset augmentation module; P-UI and 1-P-UI are the unbiased Teacher model and fusion model in the current sample. The propensity score of method to learn more complex pattern information, fp is solved through Meta learning.
The following is the complete training process of the algorithm:
We conducted evaluations on two public datasets, Yahoo R3 and Coat. Yahoo R3 collected 15,000 users' ratings of 1,000 songs, and collected a total of 310,000 biased data and 5,400 pieces of unbiased data. The Coat data set collects 6900 pieces of biased data and 4600 pieces of unbiased data through 290 users' ratings of 300 pieces of clothing. The ratings of users in both data sets range from 1 to 5. The biased data comes from the data users of the platform, and the unbiased samples are collected by randomly assigning ratings to users.
In addition to two public data sets, Ant also used a data set from actual industry scenarios. In order to simulate the situation where there are very few unbiased data samples, we combined all biased data sets data and 10% of the unbiased data are used for training, 10% of the unbiased data is retained as validation, and the remaining 80% is used as the test set.The Baseline comparison methods we use are mainly as follows: the first method is to use models trained using unbiased data, single biased data and direct data fusion respectively; the second method The first method is to use a small amount of unbiased data to design a regular representation to constrain the biased data and the similarity represented by the unbiased data to perform correction operations; the third method is the inverse probability weight method, the propensity score An inverse probability. Double robust is also a common correction method; Propensity free double robust is a data augmentation method, which first uses unbiased samples to learn an augmented model, and then uses the augmented samples to help the entire model correct bias; Auto Debias will also use some unbiased data for augmentation to help correct model bias.
We used two metrics, MSE and MAE, to evaluate performance. As shown in the figure, the MDI method we proposed has relatively good performance on both indicators on the Coat and Product data sets.
On the Yahoo R3 data set, the method we proposed has the best performance index on MAE, and the best performance on MSE except IPS. The three data augmentation methods, PFDR, Auto Debias and our proposed MDI, will perform better in most cases. However, since PFDR uses unbiased data to train the augmented model in advance, it will heavily rely on the unbiased data. Quality, so it only has 464 unbiased training data samples on the Coat model. When there are fewer unbiased samples, its augmentation module will be poor and the data performance will be relatively poor.
AutoDebias performs exactly opposite to PFDR on different data. Since MDI has designed an augmentation method that utilizes both unbiased data and biased data, it has a stronger data augmentation module, so it can be obtained in both cases when there is less unbiased data or when there is sufficient unbiased data. Better results.
We also evaluated the performance of these models under different proportions of unbiased data on two public data sets, using 50% to 40% of unbiased data and all biased data respectively. For training, other logic is verified with the first 10% of the unbiased data, and the remaining data is used for testing. This setting is the same as the previous experiment.
The above figure shows the performance of MAE using different methods under different proportions of unbiased data. The abscissa represents the proportion of unbiased data, and the ordinate represents the effect of each method on unbiased data. You can see As the proportion of unbiased data increases, the MAE of AutoDebias, IPS and DoubleRubus has no obvious decline process. However, instead of following the Debias method, the method of directly using original data fusion to learn will have a relatively obvious decline. This is because the higher the sample proportion of unbiased data, the better our overall data quality is, so the model can learn Better performance.
When Yahoo R3 data uses more than 30% unbiased data for training,
This method even surpasses all other bias correction methods except MDI. However, the MDI method can achieve relatively better performance, which can also prove that the MDI method has relatively robust results under unbiased data of different sizes.#At the same time, we also conducted ablation experiments to verify whether the settings of each part of the augmentation module are effective on three data sets.
The setting of λ=0 means that the augmented module is directly removed; Pu,i = 1 means that only unbiased data is used to model the augmented data module; Pu,i = 0 means that only biased and augmented modules are used Fusion data modeling augmented data module. The above figure shows the results of the ablation experiment. It can be seen that the MDI method has achieved relatively good results on the three data sets, indicating that the augmentation module is necessary. Whether it is on a public data set or a data set in an actual business scenario, the augmentation method we proposed to fuse unbiased and biased data is better than the previous data. The fusion schemes all have better results, and the robustness of MDI is also verified through parameter sensitivity experiments and ablation experiments.
Let’s introduce another work of the team: adjustment and correction based on backdoor. This work has also been published on the Industry Track of SIGIR2023. The scenario of backdoor adjustment and correction application is the scenario of marketing recommendation. As shown in the figure below, the interaction between the user and the coupon or the user and any advertisement or item is not subject to any intervention. There is an equal possibility of any interaction, and each coupon also has an equal chance. may be exposed to any user. #But in actual business scenarios, in order to protect or help some small merchants to increase traffic and ensure the overall user participation experience, it is usually set A series of strategic constraints, this situation will cause some users to be more exposed to certain coupons, and other users to be more exposed to another coupon. This kind of intervention is the cofounder mentioned above. What problems will this kind of intervention cause in the e-commerce marketing scenario? As shown in the figure above, for simplicity, we simply divide users into two categories: high willingness to participate and low willingness to participate, and coupons into two categories: large discounts and small discounts. The height of the histogram in the figure represents the global proportion of the corresponding sample. The higher the histogram, the greater the proportion of the corresponding sample in the overall training data. The small discount coupons and high participation intention user samples shown in the figure account for the majority, which will cause the model to learn the distribution shown in the figure. The model will believe that high participation intention users prefer small discount coupons. But in fact, facing the same usage threshold, users will definitely prefer coupons with higher discounts, so that they can save more money. The model in the picture shows that the actual conversion probability of small discount coupons is lower than that of large discount coupons. However, the model's estimate for a certain sample will think that the write-off probability of small discount coupons is higher, so the model will also recommend this score. Corresponding coupons, this creates a paradox. Analyze the reasons for this paradox from the perspective of a cause-and-effect diagram, and apply a non-corrective recommendation model in the current scenario. Its cause-and-effect diagram structure As shown in the figure above, U represents the user's representation, and I represents the item's representation. D and K are the historical interactions between the user perspective and the equity perspective respectively. T represents some rule constraints set by the current business. T cannot be directly quantified, but we can indirectly see its impact on users and items through D and K. Impact. y represents the interaction between the user and the item, and the result is whether the item is clicked, written off, etc. The conditional probability formula represented by the causal diagram is shown in the upper right of the figure. The derivation of the formula follows the Bayesian probability formula. Under the given conditions of U and I, the final derivation of P|Y ui is not only related to U and I, because U will be affected by du, that is, when p is given u, the probability of p (du) also exists of. In the same way when I is given, I will also be affected by ki. The reason for this situation is that the existence of D and K leads to the existence of backdoor paths in the scene. That is, a backdoor path that does not start from U but eventually points to y (U-D-T-Y or I-K-T-Y path) will represent a false concept, that is, U can not only affect y through T, but also affect y through D. The adjustment method is to artificially cut off the path from D to U, so that U can only directly affect y through U-T-Y and U-Y. This method can remove false correlations , thereby modeling the true causal relationship. The backdoor adjustment is to do do-calculus on the observation data, and then use the do operator to aggregate the performance of all D and all K to prevent U and I from being affected by D and K. In this way a true cause and effect relationship is modeled. The derived approximate form of this formula is shown in the figure below. 4a is in the same form as the previous 3b, and 4b is an approximation of the sample space. Because theoretically the sample space of D and K is infinite, approximation can only be made through the collected data (D and K of the sample space take a size). 4c and 4d are both derivation of the desired approximation, in such a way that ultimately only one additional unbiased representation T needs to be modeled. T is an additional model of unbiased representation T by traversing the sum of the representation probability distributions of users and items in all situations to help the model obtain the final unbiased data estimate. The experiment uses two open source data sets, Tianchi and 84.51 (coupon) data sets. Simulate the impact of this rule strategy on the overall data through sampling. At the same time, data generated from a real e-commerce marketing activity scenario were used to jointly evaluate the quality of the algorithm. We compared some mainstream correction methods, such as IPW, which uses inverse probability weighting to correct bias; Unawareness alleviates the impact of bias by removing bias features; FairCo obtains relatively unbiased estimates by introducing error term constraint representations; MACR uses multiple The framework of the task estimates the consistency of the user and the popularity of the item respectively, and subtracts the consistency and popularity in the prediction stage to achieve unbiased estimation; PDA removes popularity by adjusting the loss item through causal intervention. The influence of bias; DecRS also uses backdoor adjustment to remove information bias, but it only corrects the bias of the user's perspective. The evaluation index of the experiment is AUC, because the marketing promotion scenario has only one recommended coupon or recommended candidate product, so it is essentially a two-category problem, so it is more appropriate to use AUC to evaluate. Comparing the performance of DNN and MMOE under different architectures, it can be seen that the DMBR model we proposed has better results than the original non-correction method and other correction methods. At the same time, Ds_A and Ds_B have achieved a higher improvement effect on the simulated data set than on the real business data set. This is because the data in the real business data set will be more complex and will not only be affected by rules and policies, but also may be affected by other factors. influence of factors. The current model has been launched in an e-commerce marketing event scenario. The above figure shows the online effect. Compared with the baseline model, the DMBR model has There has been a certain improvement in sales rate and verified sales volume.
The method of cause-and-effect correction is mainly used in ants in scenarios where there are rules or policy constraints. For example, in advertising scenarios, there may be restrictions on the people who can serve different ads. Some ads for pets will be more targeted at people with Pet users. In the e-commerce marketing scenario, some strategies will be set up to ensure the traffic of small merchants and avoid all traffic being consumed by large merchants. As well as ensuring the user experience of participating in activities, because the overall budget of the activity is limited, if some unscrupulous users repeatedly participate in the activity, a large amount of resources will be taken up, resulting in a poor activity participation experience for other users. In scenarios such as this, there are applications of cause and effect correction. ##3. Correction based on backdoor adjustment
##4. Application in Ant
The above is the detailed content of Application of cause-and-effect correction method in Ant Marketing recommendation scenario. For more information, please follow other related articles on the PHP Chinese website!