Home >Technology peripherals >AI >Portrait tag system construction and application practice
Qunar has built an independent image tag system in each business development process. As the company continues to grow, it is necessary to integrate the portrait labeling system of each business. From a technical perspective, the integration process is relatively simple, but integration at the business level is more complex. Because each label has different definitions in different businesses, this increases the difficulty of integration. In order to ensure that the integrated label system can better serve the company's overall strategy, in-depth keyword extraction and optimization are required to ensure the logic and consistency of each label.
User behavior refers to what the user does in the application Operations, while business logs refer to data generated by users on the server side, such as clicks, orders, and search behaviors. Portrait tags are user multi-dimensional data obtained by analyzing user behavior and business data through rule statistics and mining algorithms. By analyzing user behavior and business data, we can better understand user preferences and needs, thereby providing users with more personalized and accurate services. These user portrait tags can help companies better locate target user groups, formulate targeted marketing strategies, and improve user experience. Through in-depth analysis of user behavior and business data, companies can better understand user behavior patterns and provide users with better products and services, thereby improving user satisfaction and loyalty.
When each business department builds its own portrait tag platform, its needs are also different due to different goals, such as The air ticket business usually targets marketing, and the hotel business usually targets service. We should start from actual business needs and communicate with various departments, including company management, interns and other personnel at different levels, to conduct in-depth demand research to ensure that the integrated labeling system can better meet business needs. During the integration process, user portrait label requirements are mainly divided into three categories: marketing risk control, internal business analysis applications, and describing users.
is constructed from portrait tags The process is divided into business classification and technical classification.
Extract the user portrait classification required for the business from the user needs, mainly based on the first- and second-level categories, with business processes as the main classification basis, and continue to expand and improve .
#In addition, according to different technical requirements, we need to choose an appropriate technology stack to generate, store and call portrait tags.
First of all, it is necessary to clarify the definition and goals of the portrait tag in order to determine which technology needs to be used. Secondly, the update cycle and access method of tags need to be considered, which determines whether tags need to be processed online or offline, and which storage resources are selected. Finally, based on these factors, we can choose the appropriate technology stack to implement the portrait labeling system and ensure the performance and stability of the system. Through such technical classification, the portrait tag system can be better managed and maintained, and its scalability and usability can be improved
In addition to the hourly, weekly, and monthly update cycles already listed , we also currently implement real-time label updates, which is closer to streaming updates.
Since the portrait labeling platform needs to process a large amount of data and user requests, it is necessary to choose the appropriate one based on the background technology stack Access methods, for some large companies, the number of users and data is very large, so we need to consider how to effectively store and recall tags. Some tags may only need to be built offline, while others may need to be called online. For offline tags, we can choose resources that do not occupy high storage costs, such as storing data in Redis or HBase. For online tags, it is necessary to ensure that the system can quickly respond to user requests and provide stable services. Therefore, when choosing an access method, we need to make trade-offs and choices based on the actual situation to ensure system performance and stability.
In the production process of the portrait label system, we A series of processing is required on various data sources to finally generate labels. Among them, ID Mapping is a key link. The goal of ID Mapping is to solve the problem of different IDs pointing to the same person, especially for early-stage companies. Due to various registration methods, multiple IDs may correspond to the same user. For example, users can bind or change their mobile phone number after registering via email, or they have been allowed to use it without logging in. These situations may result in multiple IDs corresponding to the same user.
In order to solve this problem, ID Mapping is responsible for the task of realizing multi-device association. In addition, ID Mapping is also a crucial basic step for risk control. Through ID Mapping, users of different devices can be better identified and associated, allowing for better risk control and security management. Through reasonable ID Mapping design and management, we can better protect user privacy and data security, while improving the accuracy and reliability of the portrait labeling system.
##3. Common algorithm portrait tags
In practice, based on samples and technology stacks, common algorithms for model class labels can be divided into the following Several categories:
(1) Classification algorithm: Using predictive class labels for circle selection and business filtering in business processes requires sufficient sample data to train and optimize the model. Thereby improving the prediction accuracy. Prediction tags are not limited to order payment predictions, but can also include search payment predictions, search predictions, detail page predictions, etc. (2) Recommendation algorithm: related to sorting and prioritization, requiring a wider range of cutting-edge knowledge and technology stack. The goal of the recommendation algorithm is to recommend suitable hotel room types to users from the recall set. For example, for parent-child travel scenarios, the recommendation algorithm can recommend suitable hotel room types such as twin rooms or suites to users. (3) Knowledge graph: Use graph database technology to better reveal users and their surrounding relationships. There are many applications in risk control scenarios, such as identifying abnormal users and determining whether they are malicious users. (4) Causal inference: An example is used to explain the impact of sending text messages and push messages to users on marketing effects, and involves cost issues. (5) Graphics and images: Combine graphics and image processing technology to mark graphics and images. It involves image segmentation, recognition and other technologies, but more often it is applied reversely to image labeling through user tags. For example, for users who post inappropriate comments, their labels are extracted and applied to the graphic image labeling algorithm to improve the efficiency and accuracy of labeling. (6) NLP robot (7) Lookalike marketing algorithm: an algorithm for expansion marketing through seed users. There will be different classification methods based on demand types: Relying only on portrait tags for filtering may produce a large number of How to sort target users who do not meet their needs has become a difficult problem. Traditional methods such as sorting based on value, activity, etc., are difficult to ensure that the selected users are most similar to the target user group. Through knowledge graphs or frequent patterns, we can measure the similarity between users, and this similarity is quantifiable and scalable. Through the relationship level, the algorithm can more accurately find user groups similar to the target user. Compared with traditional association rules and portrait tags, causal inference can solve Deeper question. Association rules and portrait labels mainly solve correlation problems, such as "users who buy beer may also buy diapers", but cannot explain why this correlation exists. This correlation may not hold true in different cultures and markets. Therefore, through causal inference through historical data and models, the key factors affecting user behavior and conversion can be found. These key factors can be found through relationship discovery, which in turn helps us better understand user behavior and business processes. For example, the red part in the upper right corner filters out the parts that better reflect the business process through the understanding of the business, so as to expand more users. In the portrait of the object In the process of constructing the portrait, we mainly focus on the attributes and characteristics of the objects, such as cities, business districts, routes, flights, etc. in the hotel portrait. These properties help us describe and understand objects more accurately and provide rich content for their portraits. Compared with user portraits, object portraits emphasize the similarity between objects. In practice, we usually use the similarity of objects for operations such as recommendation and ranking. In order to measure the similarity between objects, various methods can be used, such as attribute vectors and embedding. These methods can represent objects as vectors and use these vectors to perform similarity calculations. It should be noted that although the process of building object portraits is similar to the process of building user portraits, in actual applications, we need to make appropriate adjustments and optimizations based on business needs and scenarios. At the same time, it is also necessary to conduct in-depth analysis of the relationships and hierarchical structures between objects to ensure that the portraits of objects accurately reflect business needs. In addition, in the process of constructing the image of the object, we also need to pay attention to some key issues. (1) Similarity does not mean similarity. For example, when using the embedding method, if high-value user groups are searching for five-star hotels, the correlation between these five-star hotels may be strong. But in some business scenarios, this correlation may not apply. Therefore, we need to carefully consider the similarity of objects based on specific business scenarios. (2) Cold start problem. For example, in hotel profiling, when a new hotel comes online, it may lack user behavior data. In order to solve this problem, we can use attribute distance to extract large-dimensional label attributes, construct a user-friendly portrait label, and use this label to perform similarity calculations. (3) Interpretability
# The ##portrait tag plays a vital role in the selection and diffusion process of marketing. By rationally using portrait tags, operators can conduct more detailed analysis and screening of the selected user groups. When operators feel that the initially selected user groups are too large or too small, or the marketing effect needs to be further expanded or optimized, It can be diffused or re-selected through portrait tags to achieve better marketing results. However, when selecting and spreading portrait tags, the most common problems are the four quadrants of user conversion and operational intervention. These four quadrants respectively represent different user conversion states and operational intervention strategies, which require different responses to different situations. For example, for users with high conversion and low intervention, you can adopt strategies to maintain the status quo; for users with low conversion and low intervention, you can adopt strategies to promote conversion, etc. The following are the four stages of marketing selection and diffusion during the application process of portrait tags: Scientific analysis: Deeply dig into user data and accurately locate target groups to improve conversion effect. Auxiliary circle selection: Use tags to efficiently filter target users and improve the pertinence and efficiency of marketing activities. Intelligent expansion: Based on algorithms and models, intelligently classify and expand user groups to expand marketing coverage. Model implementation: Combined with actual marketing activities, optimize portrait tags and strategies to achieve the best marketing results. Analyze through the portrait tag system The quality of business indicators and further optimize strategies. During the business iteration process, we usually use methods such as attribution analysis algorithms and business analysis to generate strategies. Then conduct experimental measurements. If the experimental strategy performs well, it will be fully launched. However, two problems will be encountered in this process: how to analyze the quality of the indicators and the quality of the experimental results. In order to solve these problems, we need to conduct attribution analysis of business indicators. First, discover business problems through reports, alarms, etc., find out the causes of the problems, and clarify specific scenarios and actual transformation relationships. Next, locate the cause of the problem and determine whether the cause is controllable or uncontrollable. If it is uncontrollable, it may be a natural jitter and does not require too much attention; if it is controllable, it is necessary to further explore whether there are unknown scenarios that cause this problem. In the qualitative analysis module, we will clarify controllable and uncontrollable factors, and explore the causes of problems in some unknown scenarios. Finally, suggestions are given to guide business personnel in what scenarios they should do it. This scenario actually means that the conversion rate of a certain business has dropped. Through the analysis process of the entire business, we can figure out the proportion of non-market factors and controllable factors. If market factors account for a large proportion, then we can solve the problem later without immediately using a lot of manpower and material resources. #In the process of being responsible for Qunar’s AB experimental system, we often face some challenge. When the product team invests a lot of time and resources to complete the experiment, if the experimental results are not significant, it is easy to have questions such as "Why the experiment is invalid" and "What is the direction of the next iteration?" In order to solve these problems, we conducted an AB experimental performance analysis, which was mainly divided into three parts. First, we tried to determine whether the poor experimental results were due to insufficient volume improvement through the business process funnel model, core user portrait label identification, and business domain misleading label identification. Secondly, use analysis methods such as decision trees to explore whether there are problems with the qualitative improvement, such as conflicts in other experiments or situations where the improvement does not reach a significant proportion. Finally, quantify the action effectiveness and clarify the impact of each action on the goal. Through these analysis processes, we can provide specific guidance to the product team to help them choose higher-efficiency directions for optimization, thereby achieving qualitative improvement. These analyzes not only help optimize product iteration directions, but also save resources and time for the company and improve overall business results.
A1: User behavior data mainly records users’ interactive behaviors on the APP side, such as clicks, etc. These data mainly reflect the user’s interaction process. Business data involves various information processed in the background, such as agent connection processes, logistics information, etc. Although these data are invisible to users, they are also crucial to understanding the entire business process and improving user experience. In actual operation, we need to incorporate these data into our portrait tag system to better analyze and understand user behavior and business processes. For example, for e-commerce platforms, some data may not be relevant to users, but some involve user experience and business processes, so appropriate screening and processing are required. A2: Streaming tags can be implemented through streaming computing, such as using tools such as Flink. Users can drag and drop defined data to calculate labels through streaming calculations. At the same time, you can also upload Python code or SQL code for customized calculations. In addition, it can also be supported through Spark and other methods. In streaming tags, the amount and time window of calculations need to be limited to meet different needs. Streaming tags can support complex tag rules. Users can implement more complex label calculations by uploading Python code or SQL code. Streaming tags can be implemented in two ways: data development and visual configuration. On the Qunar platform, users can drag and drop defined data to calculate labels through streaming computing, or upload Python code or SQL code for customized calculations. A3: Real-time tags refer to tags that are calculated and applied in real time when user behavior or business events occur. For example, when a user submits a complaint on the front-end interface, the system will analyze the user's demands and order issues in real time, and label the user with corresponding real-time labels. This kind of real-time labeling can quickly reflect user needs and problems for timely processing and optimization. Different companies have different definitions of real-time tags. For Qunar, anything within 3 seconds is considered real-time, while hours are considered a non-real-time scenario. A4: With the popularity of mobile Internet , more and more companies are beginning to use mobile phone numbers as unique identifiers for users. One-click login has become a common practice in the industry, making it easier for users to log in and use applications. For platforms like Qunar, we also use mobile phone numbers as unique user IDs. In most cases, we treat a mobile phone number as a unique identifier for a user. However, in some special cases, we will also consider the scenario where the user changes their mobile phone number and handle it accordingly. In addition, in order to better manage and identify users, when a mobile phone number is logged in on two devices, we will use a series of judgments to determine the user's holding status of the device. If the user logs into the device temporarily, we consider him/her to be a visitor; if the user holds the device for a long period of time, he/she is deemed to be a holder. A5: The most common one is product pricing. In order to personalize product pricing, we need to use product tags. These labels are calculated based on specific numerical values for internal and external factors. If internal factors are not properly sorted out, the impact of external factors may be exaggerated. can be understood as similar to a brute force solution. We put every factor in and try it, and then see how much influence each factor has on it, and judge whether it is correlation or causation in each factor. . A6: After the real-time tags were built, we have tried our best to exhaust some real-time tags that can be obtained through basic statistics through the development level. As for real-time tags such as rules and models, they must be customized and developed. A7: There will be some one-time tags at the beginning of the establishment, which will not be used after use. A8: Smaller business companies may have insufficient traffic. If you want to achieve a minimum sample size, it is not possible to achieve it at the operational level, so we need to have some When the minimum sample size is not reached, the experimental effect can be quickly and roughly inferred. A9: Show that every company is different. From a storage perspective, Qunar has multiple storage methods. We can tolerate redundant storage of some data, mainly for fast real-time response. That is, when accessing tags, we try our best to use one Low time consuming to access it. A10: In fact, through my current practice here at Qunar, large models are widely used in algorithm labeling. First, the simplest example. When we build user portraits, we often encounter POI landmark data. The landmark data is extracted from some documents. Maybe this is the large model used. The accuracy of this place is honest. Much better than some of the models we have built ourselves in the past. And when we build a knowledge graph, we will encounter some entity disambiguation, entity merging, etc. A11: Actually no, this recommendation is to recommend engineers, but the recommendation algorithm needs to use the results of the portrait engineer. The portrait engineer needs to make a clear distinction between the quality of the portrait label and the application scenario. The description is so that recommendation sorting engineers can better use it.
2. Looklike algorithm based on knowledge graph and frequent patterns
3. Lookalike algorithm based on causal inference
4. The portrait of the object
##4. Portrait tag application scenarios
Application 1: Marketing crowd selection and diffusion
Application 2: Business indicator attribution analysis
Application Three: AB Experimental Performance Analysis
5. Question and Answer Session
Q1: What is the difference between user behavior and business logs?
Q2: How is streaming labeling currently done? Can it support more complex tag rules? Is it developed from data or configured visually?
Q3: What is real-time tags?
Q4: Does ID Mapping identify multiple mobile phone numbers/device numbers into a unique ID? Or does it allow each user to have a unique ID? For example, a mobile phone number has been logged in on two devices, and one of the devices has logged into another mobile phone number. Is it the only one or three?
Q5: What are the application scenarios of product labels?
Q6: Do real-time business labels need to be customized and developed?
Q7: How to manage the life cycle of tags?
Q8: Can some statistical methods be used to determine the minimum sample size for AB experiments? There is a standard calculation process for the AB experiment. Can we know the approximate sample size required to achieve a statistically significant effect?
Q9: How are the caliber types of user caliber portraits stored and displayed? In addition to single tags, user portraits also have multiple tags to form a user preference perspective. How to store these two types of tags better?
Q10: What are the applications of models in the construction of solution labels?
Q11: Do profiling algorithm engineers also need to implement ranking recommendations?
The above is the detailed content of Portrait tag system construction and application practice. For more information, please follow other related articles on the PHP Chinese website!