Home >Technology peripherals >AI >Application of deep learning in Ctrip search word meaning analysis
The big data and AI R&D team of Ctrip Tourism R&D Department provides the tourism division with a wealth of AI technology products and technical capabilities.
Search is one of the most important aspects of e-commerce. Most users use search to find the products they want. Therefore, search is the most direct way for users to express their intentions. , and one of the traffic sources with the highest conversion rate. The vast majority of e-commerce searches are completed by entering search terms (Query) into the search box. Therefore, the meaning analysis and intent understanding of the search terms have become an important part of the search.
Mainstream search word meaning analysis and query understanding require steps such as error correction, synonym replacement, word segmentation, part-of-speech tagging, entity recognition, intent recognition, word importance weighting, and word loss. Take the search in the tourism scenario as an example, as shown in Figure 1. When the user enters "Yunnan Xiangge Lira" as Query in the search box, the search engine first needs to correct the search term. This is to facilitate the subsequent steps. Parse out the content the user wants to search for; if necessary, synonym replacement will be performed. Then, perform word segmentation and part-of-speech tagging on the search terms to identify "Yunnan" as a province and "Shangri-La" as a city or hotel brand. Then, entity recognition will be performed to recall the corresponding entities of "Yunnan" and "Shangri-La" in the background database. id.
At this time, a disagreement emerged. "Shangri-La" may be both a city and a hotel brand. When users search, whether the correct categories and entities can be predicted is of great significance to the accurate display of search results and improving user experience. Therefore, we must identify the category that the user really wants to search for and find the corresponding entity. Otherwise, results that the user does not want may be given in the front row of the search list page. Judging from people's prior knowledge, when users search for "Yunnan Shangri-La", it is very likely that they want to search for a city. The intention identification step is to realize this function and identify the user's true search intention, which represents the city's "Shangri-La".
You can then enter the recall step of the search. The recall is mainly responsible for finding products or content related to the intent of the search term. After obtaining the IDs of "Yunnan" and "Shangri-La" in the previous steps, you can easily recall products or content related to both "Yunnan" and "Shangri-La". However, sometimes, the recall results are empty or too rare, and the user experience is not good at this time. Therefore, when the recall results are empty or too rare, word loss and secondary recall operations are often required. In addition, some words that are omissible, or words that interfere with the search, can also be processed by losing words.
The so-called lost words are to throw away the relatively unimportant or loosely related words in the search terms and recall them again. So how do you measure the importance or closeness of each word? Here we need to introduce the Term Weighting module, which treats each word as a term and calculates the weight of each term through algorithms or rules. The weight of each term directly determines the order of term importance and closeness. For example, assuming that the term weight of "Yunnan" is 0.2 and the term weight of "Shangri-La" is 0.8, then if you need to lose words, you should lose "Yunnan" first and keep "Shangri-La".
Figure 1 Search word meaning analysis and Query understanding steps
Traditional search intent identification will use vocabulary matching, category probability statistics, and artificial settings Set rules. Traditional Term Weighting also uses vocabulary matching and statistical methods. For example, based on the titles and contents of all products, data such as TF-IDF, mutual information between preceding and following words, and left and right neighbor entropy are calculated and directly stored into dictionaries and scores, providing For online use, it can be used to assist judgment based on some rules. For example, industry proper nouns directly give higher term weights, and particles directly give lower term weights.
However, traditional search intent recognition and Term Weighting algorithms cannot achieve high accuracy and recall rates, especially cannot handle some rare search terms, so some new technologies are needed to improve these two module’s precision and recall, as well as improving its ability to adapt to rare search terms. In addition, due to the high frequency of access, search word meaning analysis requires a very fast response speed. In the travel search scenario, the response speed often needs to reach the millisecond level close to single digits, which is a big challenge for the algorithm.
In order to improve accuracy and recall, we use deep learning to improve search intent recognition and Term Weighting algorithms. Deep learning can effectively solve intent recognition and Term Weighting in various situations through sample learning. In addition, the introduction of large-scale pre-trained language models for natural language processing can further strengthen the capabilities of deep learning models, reduce the amount of sample labeling, and make it possible to apply deep learning in search, which originally had high labeling costs.
However, the problem faced by deep learning is that due to the high complexity of the model and the deep number of neural network layers, the response speed cannot meet the high requirements of search. Therefore, we use model distillation and model compression to reduce the complexity of the model and reduce the time consumption of the deep learning model while slightly reducing the accuracy and recall rate, thereby ensuring faster response speed and higher performance.
Category recognition is the main component of intent recognition. Category recognition in intent recognition is a method in which after the search word query is segmented, the segmentation result is marked with the category it belongs to and the corresponding probability value is given. Analyzing the intent of the user's search terms is helpful in analyzing the user's direct search needs, thereby helping to improve the user experience. For example, when a user searches for "Yunnan Shangri-La" on the travel page, the category corresponding to "Shangri-La" entered by the user is "city" instead of "hotel brand", which guides subsequent search strategies to be biased towards city intentions.
In the travel scenario, search terms with ambiguous categories entered by users account for about 11% of the total, including a large number of search terms without word segmentation. "No word segmentation" means that there are no more detailed segments after word segmentation processing, and "category ambiguity" means that the search term itself has multiple possible categories. For example, when a user inputs "Shangri-La", there is no more detailed segmentation, and there are multiple categories such as "city" and "hotel brand" in the corresponding category data.
If the search term itself is a combination of multiple words, the category can be clarified through the context of the search term itself, and the search term itself will be prioritized as the identification target. If the category cannot be determined from the search terms themselves, we will first add the user's recent historical search terms that are different from each other, as well as recent product category click records. If the above information is not available, we will add positioning stations as supplementary corpus. The original search terms are processed to obtain the Query R to be identified.
In recent years, pre-trained language models have shone in many natural language processing tasks. In category recognition, we use the training network parameters of the pre-trained model to obtain the word feature Outputbert containing contextual semantics; using the word conversion module, the word feature is combined with position coding:
Obtain the character fragment corresponding to the word segmentation, such as:
means that the length corresponding to the i-th word segmentation is li character characteristics. Based on the character fragment Wi, the word conversion module aggregates the features Hwi of each word. Aggregation methods can be max-pooling, min-pooling, mean-pooling, etc. Experiments show that max-pooling has the best effect. The output of the module is the word feature OutputR of the search word R; the matching categories covered in the category database are given for each fragment in the word feature OutputR of the search word through a parallel classifier. , and give the matching probability of the corresponding category.
Figure 2 Schematic diagram of the overall structure of category recognition
The category recognition model is based on the BERT-base 12-layer model. Since the model It is too large and does not meet the response speed requirements for online operation. We performed knowledge distillation on the model to transform the network from a large network into a small network, retaining performance close to that of a large network while meeting the requirements for online operation. Delay request.
The originally trained category recognition model is used as the teacher network, and the output result of the teacher network is used as the target of the student network. The student network is trained so that the result p of the student network is close to q. Therefore, we can change the loss function Written as:
Here CE is cross entropy (Cross Entropy), symmetricalKL is symmetrical KL divergence (Kullback–Leibler divergence), y is the one-hot encoding of the real label, q is the output result of the teacher network, and p is the output result of the student network.
Figure 3 Schematic diagram of knowledge distillation
After knowledge distillation, category recognition can still achieve a relatively high level in the end. It has high accuracy and recall rate, and at the same time, the overall response time of 95 lines can be about 5ms.
After category identification, entity linking and other steps are required to complete the final intent identification process. For specific content, please refer to the article "Exploration and Practice of Ctrip Entity Link Technology" , which will not be elaborated in this article.
4. Term WeightingFor the search terms entered by the user, different terms have different importance to the user’s core semantic appeal. In the secondary recall ranking of search, you need to focus on terms with high importance, and at the same time, you can ignore terms with low importance when losing words. By calculating the term weight of each search term entered by the user, the product closest to the user's intention is recalled twice to improve the user experience. First of all, we need to find real feedback data from online users as annotation data. The user's input in the search box and clicks on associated words reflect the user's emphasis on the words in the search phrase to a certain extent. Therefore, we use the input and click data on associated words, manual screening and secondary annotation as the annotation of the Term Weighting model. data. In terms of data preprocessing, the annotated data we can obtain are phrases and their corresponding keywords. In order to make the distribution of weights not too extreme, a certain amount of small weights are given to non-keywords, and The remaining weight is assigned to each word of the keyword. If a certain phrase appears multiple times in the data and the corresponding keywords are different, the weights of these keywords will be assigned based on the frequency of the keyword, and further Assign a weight to each word. The model part mainly tries BERT as a feature extraction method, and further fits the weight of each term. For a given input, convert it into a form that BERT can accept, compress the tensor after BERT through the fully connected layer, obtain a one-dimensional vector, and then perform Softmax processing, and use this vector to weight the result vector For fitting, the specific model framework is shown in the figure below:Figure 4 Term Weighting model framework
Since Chinese BERT is based on characters, the weights of all words in each term need to be summed to finally obtain the weight of the term. In the entire model framework, excluding some training hyperparameters, the adjustable parts mainly include two parts: First, when generating Embedding through BERT, you can choose the last layer of BERT, or the first layer of comprehensive BERT Embedding is generated by layer and last layer; secondly, in the selection of loss function, in addition to using MSE loss to measure the gap between the predicted weight and the actual weight, we also try to use the sum of the predicted weights of non-important words as the loss to calculate , but this loss is more suitable for situations where there is only a single keyword. The model eventually outputs each term weight in the form of a decimal. For example, the term weight results of ["Shanghai", "'s", "Disney"] are [0.3433, 0.1218, 0.5349]. This model serves search and has strict response speed requirements. Since the BERT model is relatively large overall, it is difficult to meet the response speed requirements in the inference part. Therefore, similar to the category recognition model, we further distill the trained BERT model to meet online requirements. In this project, a few layers of transformers are used to fit the effect of the BERT-base 12-layer transformer. In the end, the overall inference speed of the model is about 10 times faster at the loss of an acceptable part of the performance. In the end, the overall 95 lines of the Term Weighting online service can reach about 2ms. 5. Future and ProspectsAfter using deep learning, travel search has greatly improved its word meaning analysis capabilities for rare long-tail search terms. In the current real online search scenarios, deep learning methods are generally combined with traditional search word meaning analysis methods. This can not only ensure the stability of the performance of common search terms in the head, but also strengthen the generalization ability. In the future, search word meaning analysis is committed to bringing a better search experience to users. With the upgrading of hardware technology and AI technology, high-performance computing and intelligent computing are becoming more and more mature. The intention of search word meaning analysis is Identification and Term Weighting will be developed towards higher performance goals in the future. In addition, larger-scale pre-training models and pre-training models in the tourism field will help further improve the accuracy and recall rate of the model. The introduction of more user information and knowledge will help improve the effect of intent recognition. Online users’ Feedback and model iteration help improve the effectiveness of Term Weighting. These are the directions we will try in the future. In addition to intent recognition and Term Weighting, other search functions, such as part-of-speech tagging, error correction, etc., can also use deep learning technology in the future to achieve more powerful functions while meeting the response speed requirements. and better results.The above is the detailed content of Application of deep learning in Ctrip search word meaning analysis. For more information, please follow other related articles on the PHP Chinese website!