Home >Technology peripherals >AI >Exploration and practice of Meituan search rough ranking optimization

Exploration and practice of Meituan search rough ranking optimization

WBOY
WBOYforward
2023-04-12 11:31:091912browse

Author: Xiaojiang Suogui Li Xiang et al

Rough ranking is the search and promotion in the industry important module of the system. In the exploration and practice of optimizing the rough ranking effect, the Meituan search ranking team optimized rough ranking from two aspects: fine ranking linkage and joint optimization of effect and performance based on actual business scenarios, improving the effect of rough ranking.

1. Foreword

As we all know, in large-scale industrial application fields such as search, recommendation, and advertising, in order to balance performance and effect, ranking systems are commonly used Cascade architecture [1,2], as shown in Figure 1 below. Taking the Meituan search ranking system as an example, the entire ranking is divided into rough sorting, fine sorting, rearrangement and mixed sorting levels; rough sorting is located between recall and fine sorting, and it is necessary to filter out a hundred-level item set from a thousand-level candidate item set. Give it to the fine rowing layer.

Exploration and practice of Meituan search rough ranking optimization

Figure 1 Sorting funnel

Examine the rough ranking module from the perspective of the full link of Meituan search ranking , there are currently several challenges in coarse sorting layer optimization:

  • Sample selection bias: Under the cascade sorting system, Rough sorting is far away from the final result display, which leads to a large difference between the offline training sample space of the rough sorting model and the sample space to be predicted, and there is a serious sample selection bias.
  • Rough sorting and fine sorting linkage: Rough sorting is between recall and fine sorting. Rough sorting requires more acquisition and utilization of subsequent chains. road information to improve the effect.
  • Performance Constraints: The candidate set for online rough ranking prediction is much higher than that of the fine ranking model. However, the actual entire search system has strict requirements on performance, resulting in Coarse sorting needs to focus on predictive performance.

This article will focus on the above challenges to share the relevant exploration and practice of Meituan search rough ranking layer optimization. Among them, we put the problem of sample selection bias together with the fine ranking linkage problem. solve. This article is mainly divided into three parts: the first part will briefly introduce the evolution route of the rough ranking layer of Meituan search ranking; the second part introduces the related exploration and practice of rough ranking optimization, the first of which is to use knowledge distillation and comparative learning to make fine Linkage between rough sorting and rough sorting to optimize the rough sorting effect. The second task is to consider the rough sorting performance and effect trade-off optimization of rough sorting. All related work has been fully online, and the effect is significant; the last part is the summary and outlook. I hope these The content is helpful and inspiring to everyone.

2. Rough ranking evolution route

The evolution of Meituan Search’s rough ranking technology is divided into the following stages:

  • 2016: Linear weighting based on information such as correlation, quality, conversion rate, etc. This method is simple but has poor expression ability of features. Weak, the weights are manually determined, and there is a lot of room for improvement in the sorting effect.
  • 2017: Pointwise estimated ranking using a simple LR model based on machine learning.
  • 2018: Using the two-tower model based on vector inner product, query terms, users, contextual features and merchants are input on both sides. Features, after deep network calculation, the user & query word vectors and merchant vectors are respectively generated, and then the estimated scores are obtained through inner product calculation for sorting. This method can calculate and save merchant vectors in advance, so online prediction is fast, but the ability to cross information on both sides is limited.
  • 2019: In order to solve the problem that the twin-tower model cannot model cross-features well, the output of the twin-tower model is used as Features are fused with other cross-features through the GBDT tree model.
  • 2020 to present: Due to the improvement of computing power, I began to explore the NN end-to-end rough model and continue to iterate the NN model.

At this stage, the two-tower model is commonly used in the industrial rough ranking model, such as Tencent [3] and iQiyi [4]; the interactive NN model, such as Alibaba Baba[1,2]. The following mainly introduces the related optimization work of Meituan Search in the process of upgrading rough ranking to NN model, which mainly includes two parts: rough ranking effect optimization and effect & performance joint optimization.

3. Coarse ranking optimization practice

With a large amount of effect optimization work [5,6] implemented in Meituan Search Fine Ranking NN model, we also began to explore the optimization of coarse ranking NN model . Considering that rough sorting has strict performance constraints, it is not applicable to directly reuse the optimization work of fine sorting into rough sorting. The following will introduce the optimization work of fine sorting linkage effects on migrating the sorting capabilities of fine sorting to coarse sorting, as well as the effect and performance trade-off optimization of automatic search based on neural network structure.

3.1 Optimizing the linkage effect of fine ranking

The rough ranking model is limited by the scoring performance constraints, which will lead to a simpler model structure and a smaller number of features than the fine ranking model. Much less than fine sorting, so the sorting effect is worse than fine sorting. In order to make up for the effect loss caused by the simple structure and few features of the rough ranking model, we tried the knowledge distillation method [7] to link the fine ranking to optimize the rough ranking.

Knowledge distillation is a common method in the industry to simplify the model structure and minimize the effect loss. It adopts a Teacher-Student paradigm: a model with a complex structure and strong learning ability is used as a Teacher model. The model with a relatively simple structure is used as the Student model, and the Teacher model is used to assist the Student model training, thereby transferring the "knowledge" of the Teacher model to the Student model to improve the effect of the Student model. The schematic diagram of fine row distillation and rough row distillation is shown in Figure 2 below. The distillation scheme is divided into the following three types: fine row result distillation, fine row prediction score distillation, and feature representation distillation. The practical experience of these distillation schemes in Meituan search rough ranking will be introduced below.

Exploration and practice of Meituan search rough ranking optimization

Figure 2 Fine row distillation rough row diagram

3.1.1 Fine row distillation result list

Rough sorting is a pre-module for fine sorting. Its goal is to initially screen out a set of candidates with better quality to enter fine sorting. From the perspective of training sample selection, in addition to regular user behaviors (click , placing orders, paying ) as positive samples, and exposing items that have not occurred as negative samples, you can also introduce some positive and negative samples constructed through the sorting results of the fine sorting model, which can alleviate rough sorting to a certain extent. The sample selection bias of the model can also transfer the sorting ability of fine sorting to coarse sorting. The following will introduce the practical experience of using the fine sorting results to distill the coarse sorting model in the Meituan search scenario.

Strategy 1: Based on the positive and negative samples fed back by users, randomly select a small number of unexposed samples at the bottom of the fine sorting to supplement the rough sorting of negative samples. ,As shown in Figure 3. This change has an offline Recall@150 (see appendix for indicator explanation) 5PP, and an online CTR of 0.1%.

Exploration and practice of Meituan search rough ranking optimization

Figure 3 Supplementary sorting result negative example

Strategy 2 : Directly perform random sampling in the finely sorted set to obtain training samples. The finely sorted positions are used as labels to construct pairs for training, as shown in Figure 4 below. The offline effect is compared to Strategy 1 Recall@150 2PP, and the online CTR is 0.06%.

Exploration and practice of Meituan search rough ranking optimization

Figure 4 Sort front and back to form a pair sample

Strategy 3: Based on the sample set selection of strategy 2, the label is constructed by classifying the refined sorting positions, and then pairs are constructed according to the classified labels for training. Compared with Strategy 2 Recall@150 3PP, the offline effect is 0.1% online CTR.

3.1.2 Fine ranking prediction score distillation

The previous use of sorting result distillation is a relatively rough way of using fine ranking information. We further add on this basis Prediction score distillation [8], it is hoped that the score output by the rough ranking model and the score distribution output by the fine ranking model will be as aligned as possible, as shown in Figure 5 below:

Exploration and practice of Meituan search rough ranking optimization

Figure 5 Fine ranking prediction score construction auxiliary loss

In terms of specific implementation, we use a two-stage distillation paradigm to distill the coarse ranking model based on the pre-trained fine ranking model. The distillation loss uses the minimum square error of the coarse ranking model output and the fine ranking model output. , and add a parameter Lambda to control the impact of distillation Loss on the final Loss, as shown in formula (1). Using the precise fractional distillation method, the offline effect is Recall@150 5PP, and the online effect CTR is 0.05%.

3.1.3 Feature Representation Distillation

The industry uses knowledge distillation to achieve fine ranking guidance and coarse ranking representation modeling, which has been verified to be an effective way to improve the model effect [ 7], however, directly using traditional methods to distill representations has the following shortcomings: First, it is impossible to distill the sorting relationship between rough sorting and fine sorting, and as mentioned above, the sorting result distillation in our scenario, offline, online The effect has been improved; the second is the traditional knowledge distillation scheme that uses KL divergence as a representation metric, which treats each dimension of representation independently and cannot effectively distill highly relevant and structured information [9]. However, in the United States, In a group search scenario, the data is highly structured, so using traditional knowledge distillation strategies for representation distillation may not be able to capture this structured knowledge well.

We apply contrastive learning technology to coarse ranking model, so that the coarse ranking model can also distill the order relationship when distilling the representation of the fine ranking model. We use Exploration and practice of Meituan search rough ranking optimization to represent the rough model and Exploration and practice of Meituan search rough ranking optimization to represent the fine model. Suppose q is a request in the data set Exploration and practice of Meituan search rough ranking optimization is a positive example under the request, and Exploration and practice of Meituan search rough ranking optimization is the corresponding k negative examples under the request.

We input Exploration and practice of Meituan search rough ranking optimization into the coarse and fine ranking networks respectively, and obtain their corresponding representations Exploration and practice of Meituan search rough ranking optimization, and At the same time, we input Exploration and practice of Meituan search rough ranking optimization into the coarse ranking network and obtain the representation Exploration and practice of Meituan search rough ranking optimization encoded by the coarse ranking model. For the selection of negative example pairs for contrastive learning, we adopt the solution in Strategy 3 to divide the order of fine sorting into bins. The representation pairs of fine sorting and rough sorting in the same bin are regarded as positive examples, and the rough and fine sorting between different bins are regarded as positive examples. The representation pair is regarded as a negative example, and then InfoNCE Loss is used to optimize this goal:

Exploration and practice of Meituan search rough ranking optimization

where represents the dot product of two vectors, and is the temperature coefficient. By analyzing the properties of InfoNCE loss, it is not difficult to find that the above formula is essentially equivalent to a lower bound that maximizes the mutual information between coarse representation and fine representation. Therefore, this method essentially maximizes the consistency between fine representation and coarse representation at the mutual information level, and can distill structured knowledge more effectively.

Exploration and practice of Meituan search rough ranking optimization

Figure 6 Comparative learning of fine-ranking information transfer

Based on the above formula (1) On top of this, supplementary contrastive learning representation distillation loss, offline effect Recall@150 14PP, online CTR 0.15%. For details of related work, please refer to our paper [10] (under submission).

Exploration and practice of Meituan search rough ranking optimization

3.2 Joint optimization of effect and performance

As mentioned earlier, the rough ranking candidate set for online prediction is relatively large. Considering the constraints of the system's full link performance, rough ranking needs to consider prediction efficiency. The work mentioned above is all optimized based on the paradigm of simple DNN distillation, but there are two problems:

  • Currently, it is limited by online performance and only uses Simple features do not introduce richer cross-features, resulting in room for further improvement of the model effect.
  • Distillation with a fixed rough model structure will lose the distillation effect, resulting in a suboptimal solution [11].

According to our practical experience, directly introducing cross features into the rough layer cannot meet online delay requirements. Therefore, in order to solve the above problems, we have explored and implemented a rough ranking modeling solution based on neural network architecture search. This solution simultaneously optimizes the effect and performance of the rough ranking model and selects the best feature combination and model that meets the coarse ranking delay requirements. Structure, the overall architecture diagram is shown in Figure 7 below:

Exploration and practice of Meituan search rough ranking optimization

Figure 7 Features and model structure based on NAS

Select Below we briefly introduce the two key technical points of neural network architecture search (NAS) and the introduction of efficiency modeling:

  • Neural network architecture search: As shown in Figure 7 above, we adopt a modeling method based on ProxylessNAS [12]. In addition to the network parameters, the entire model training adds feature Masks parameters and network architecture parameters. These parameters It is differentiable and learned along with the model target. In the feature selection part, we introduce a Mask parameter based on Bernoulli distribution to each feature, see formula (4), in which the θ parameter of Bernoulli distribution is updated through backpropagation, and finally the importance of each feature is obtained . In the structure selection part, L-layer Mixop representation is used. Each group of Mixop includes N optional network structural units. In the experiment, we used multi-layer perceptrons with different numbers of hidden layer neural units, where N= {1024 , 512, 256, 128, 64}, and we also added a structural unit with a hidden unit number of 0, which is used to select neural networks with different numbers of layers.

Exploration and practice of Meituan search rough ranking optimization

  • Efficiency modeling: In order to model the efficiency metric in the model objective, we need to adopt a differentiable learning objective To represent the model time consumption, the time consumption of the rough model is mainly divided into feature time consumption and model structure time consumption.

For feature time consumption, the delay expectation of each feature fi can be modeled as shown in formula (5), where Exploration and practice of Meituan search rough ranking optimization is the delay of each characteristic recorded by the server.

Exploration and practice of Meituan search rough ranking optimization

In actual situations, the characteristics can be divided into two categories. One part is the upstream transparent transmission type characteristics, and its delay mainly comes from the upstream transmission delay. time; another type of feature comes from local acquisition (reading KV or calculation), then the delay of each feature combination can be modeled as:

Exploration and practice of Meituan search rough ranking optimization

where Exploration and practice of Meituan search rough ranking optimization and Exploration and practice of Meituan search rough ranking optimization represent the number of corresponding feature sets, Exploration and practice of Meituan search rough ranking optimization and Exploration and practice of Meituan search rough ranking optimization Modeling system feature pull concurrency.

For the delay modeling of the model structure, please refer to the right part of Figure 7 above. Since the execution of these Mixops is performed sequentially, we can calculate the model structure delay recursively. At this time, the time consumption of the entire model part can be expressed by the last layer of Mixop. The schematic diagram is shown in Figure 8 below:

Exploration and practice of Meituan search rough ranking optimization

Figure 8 Model extension Time calculation diagram

The left side of Figure 8 is a rough network equipped with network architecture selection, where represents the weight of the th neural unit of the th layer. On the right is a schematic diagram of network delay calculation. Therefore, the time consumption of the entire model prediction part can be expressed by the last layer of the model, as shown in formula (7):

Exploration and practice of Meituan search rough ranking optimization

Finally, we introduce the efficiency index into the model, The final loss of model training is shown in the following formula (8), where f represents the fine ranking network, Exploration and practice of Meituan search rough ranking optimization represents the balance factor, and Exploration and practice of Meituan search rough ranking optimization represents the scoring output of rough ranking and fine ranking respectively.

Exploration and practice of Meituan search rough ranking optimizationExploration and practice of Meituan search rough ranking optimizationExploration and practice of Meituan search rough ranking optimization

Jointly optimize the effect and prediction performance of the rough ranking model through the modeling of neural network architecture search, offline Recall@150 11PP, and finally online When the delay does not increase, the online indicator CTR is 0.12%; detailed work can be found in [13], which has been accepted by KDD 2022.

4. Summary

Starting in 2020, we have implemented a rough-layer MLP model through a large number of engineering performance optimizations. In 2021, we will continue to implement Based on the MLP model, the coarse ranking model is continuously iterated to improve the coarse ranking effect.

First of all, we draw on the distillation scheme commonly used in the industry to link fine ranking to optimize rough ranking, and conduct three levels of fine ranking result distillation, fine ranking prediction score distillation, and feature representation distillation. A large number of experiments were carried out to improve the effect of the rough layout model without increasing online delay. Secondly, considering that traditional distillation methods cannot handle feature structured information well in sorting scenarios, we developed a self-developed scheme for transferring fine sorting information to coarse sorting based on contrastive learning.

Finally, we further considered that rough optimization is essentially a trade-off between effect and performance. We adopted the idea of ​​multi-objective modeling to simultaneously optimize effect and performance, and implemented the neural network architecture automatically. Search technology is used to solve the problem, allowing the model to automatically select the feature set and model structure with the best efficiency and effect. In the future, we will continue to iterate on the rough layer technology from the following aspects:

  • Rough row multi-objective modeling: The current coarse row is essentially a single-objective model. We are currently trying to apply the multi-objective modeling of the fine row layer to the coarse row. Row.
  • System-wide dynamic computing power allocation linked with rough sorting: Rough sorting can control the computing power of recall and the computing power of fine sorting. For different scenarios, the model needs The computing power is different, so dynamic computing power allocation can reduce the system computing power consumption without reducing the online effect. At present, we have achieved certain online effects in this aspect.

5. Appendix

Traditional sorting offline indicators are mostly based on NDCG, MAP, and AUC indicators. For rough sorting, their essence is more It is biased toward recall tasks that target set selection, so traditional ranking indicators are not conducive to measuring the iteration effect of the rough ranking model. We refer to the Recall indicator in [6] as a measure of the offline effect of rough sorting, that is, using the fine sorting results as the ground truth to measure the alignment degree of the TopK results of rough sorting and fine sorting. The specific definition of the Recall indicator is as follows:

The physical meaning of this formula is to measure the overlap between the top K of rough sorting and the top K of fine sorting. This indicator is more consistent with the selection of rough sorting sets. the essence of.

6. Author introduction

Xiao Jiang, Suo Gui, Li Xiang, Cao Yue, Pei Hao, Xiao Yao, Dayao, Chen Sheng, Yun Sen, Li Qian etc., all from Meituan Platform/Search Recommendation Algorithm Department.

The above is the detailed content of Exploration and practice of Meituan search rough ranking optimization. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete