Home >Technology peripherals >AI >Portrait tag system construction and application practice

Portrait tag system construction and application practice

王林
王林forward
2024-03-07 11:50:07706browse

1. Image tag system

Qunar has built an independent image tag system in each business development process. As the company continues to grow, it is necessary to integrate the portrait labeling system of each business. From a technical perspective, the integration process is relatively simple, but integration at the business level is more complex. Because each label has different definitions in different businesses, this increases the difficulty of integration. In order to ensure that the integrated label system can better serve the company's overall strategy, in-depth keyword extraction and optimization are required to ensure the logic and consistency of each label.

Portrait tag system construction and application practice

1. What is a portrait tag

User behavior refers to what the user does in the application Operations, while business logs refer to data generated by users on the server side, such as clicks, orders, and search behaviors. Portrait tags are user multi-dimensional data obtained by analyzing user behavior and business data through rule statistics and mining algorithms. By analyzing user behavior and business data, we can better understand user preferences and needs, thereby providing users with more personalized and accurate services. These user portrait tags can help companies better locate target user groups, formulate targeted marketing strategies, and improve user experience. Through in-depth analysis of user behavior and business data, companies can better understand user behavior patterns and provide users with better products and services, thereby improving user satisfaction and loyalty.

Portrait tag system construction and application practice

2. Source of demand for portrait tags

When each business department builds its own portrait tag platform, its needs are also different due to different goals, such as The air ticket business usually targets marketing, and the hotel business usually targets service. We should start from actual business needs and communicate with various departments, including company management, interns and other personnel at different levels, to conduct in-depth demand research to ensure that the integrated labeling system can better meet business needs. During the integration process, user portrait label requirements are mainly divided into three categories: marketing risk control, internal business analysis applications, and describing users.

  • Marketing risk control: user marketing, personalized recommendations, precise advertising, and user risk control.
  • Business analysis: business optimization analysis, multi-dimensional business indicator monitoring, and guidance on new business product design.
  • #Describe users: the definition of a single user, the positioning of platform users, and industry reports.

Portrait tag system construction and application practice

3. Classification of portrait tags

is constructed from portrait tags The process is divided into business classification and technical classification.

Extract the user portrait classification required for the business from the user needs, mainly based on the first- and second-level categories, with business processes as the main classification basis, and continue to expand and improve .

Portrait tag system construction and application practice

#In addition, according to different technical requirements, we need to choose an appropriate technology stack to generate, store and call portrait tags.

First of all, it is necessary to clarify the definition and goals of the portrait tag in order to determine which technology needs to be used. Secondly, the update cycle and access method of tags need to be considered, which determines whether tags need to be processed online or offline, and which storage resources are selected. Finally, based on these factors, we can choose the appropriate technology stack to implement the portrait labeling system and ensure the performance and stability of the system. Through such technical classification, the portrait tag system can be better managed and maintained, and its scalability and usability can be improved

Portrait tag system construction and application practice

(1) Construction method

  • Statistical class: It can be completed by relying on SQL.
  • Rule class: For people with a certain business background such as data analysts, business analysts, and product operators to build some rule classes through their understanding of the business labels, which will change based on changes in business understanding.
  • #Model class: This type of label requires the algorithm team to perform complex calculations or requires sample data. Unlike some basic labels, model labels may have challenges in accuracy and cannot be 100% accurate. Because sometimes the number of samples we obtain is very limited, making it difficult to maintain a high level of label accuracy. Therefore, for model class labels, we may need to find other methods and techniques to improve its accuracy and usability.

(2) Update cycle

In addition to the hourly, weekly, and monthly update cycles already listed , we also currently implement real-time label updates, which is closer to streaming updates.

(3) Access method

Since the portrait labeling platform needs to process a large amount of data and user requests, it is necessary to choose the appropriate one based on the background technology stack Access methods, for some large companies, the number of users and data is very large, so we need to consider how to effectively store and recall tags. Some tags may only need to be built offline, while others may need to be called online. For offline tags, we can choose resources that do not occupy high storage costs, such as storing data in Redis or HBase. For online tags, it is necessary to ensure that the system can quickly respond to user requests and provide stable services. Therefore, when choosing an access method, we need to make trade-offs and choices based on the actual situation to ensure system performance and stability.

4. The construction process of the portrait label system

Portrait tag system construction and application practice

In the production process of the portrait label system, we A series of processing is required on various data sources to finally generate labels. Among them, ID Mapping is a key link. The goal of ID Mapping is to solve the problem of different IDs pointing to the same person, especially for early-stage companies. Due to various registration methods, multiple IDs may correspond to the same user. For example, users can bind or change their mobile phone number after registering via email, or they have been allowed to use it without logging in. These situations may result in multiple IDs corresponding to the same user.

In order to solve this problem, ID Mapping is responsible for the task of realizing multi-device association. In addition, ID Mapping is also a crucial basic step for risk control. Through ID Mapping, users of different devices can be better identified and associated, allowing for better risk control and security management. Through reasonable ID Mapping design and management, we can better protect user privacy and data security, while improving the accuracy and reliability of the portrait labeling system.

## 2. Portrait tag platform

The portrait tag platform is also called It is a CDP platform that includes services such as portrait label production, data analysis, business applications, and effect analysis. The figure below shows the functional architecture of Qunar CDP platform.

Portrait tag system construction and application practice

At Qunar.com, after the outbreak, it strengthened its internal capacity building and integrated portrait tags with mainstream strategic platforms. Currently, the platform covers the entire life cycle of portrait tags, and can realize functions such as portrait construction, crowd selection, and final marketing actions. Through such integration, data-driven marketing strategies can be better realized and user portraits and marketing activities can be seamlessly connected. This helps improve marketing effectiveness and user satisfaction, and is also conducive to data integration and collaborative work within the enterprise.

Portrait tag system construction and application practice

##3. Common algorithm portrait tags

1. Common algorithm types for common model class labels

In practice, based on samples and technology stacks, common algorithms for model class labels can be divided into the following Several categories:

Portrait tag system construction and application practice

(1) Classification algorithm: Using predictive class labels for circle selection and business filtering in business processes requires sufficient sample data to train and optimize the model. Thereby improving the prediction accuracy. Prediction tags are not limited to order payment predictions, but can also include search payment predictions, search predictions, detail page predictions, etc.

(2) Recommendation algorithm: related to sorting and prioritization, requiring a wider range of cutting-edge knowledge and technology stack. The goal of the recommendation algorithm is to recommend suitable hotel room types to users from the recall set. For example, for parent-child travel scenarios, the recommendation algorithm can recommend suitable hotel room types such as twin rooms or suites to users.

(3) Knowledge graph: Use graph database technology to better reveal users and their surrounding relationships. There are many applications in risk control scenarios, such as identifying abnormal users and determining whether they are malicious users.

(4) Causal inference: An example is used to explain the impact of sending text messages and push messages to users on marketing effects, and involves cost issues.

(5) Graphics and images: Combine graphics and image processing technology to mark graphics and images. It involves image segmentation, recognition and other technologies, but more often it is applied reversely to image labeling through user tags. For example, for users who post inappropriate comments, their labels are extracted and applied to the graphic image labeling algorithm to improve the efficiency and accuracy of labeling.

(6) NLP robot

(7) Lookalike marketing algorithm: an algorithm for expansion marketing through seed users.

Portrait tag system construction and application practice

There will be different classification methods based on demand types:

  • Single entity: Find other related entities through relationship networks or knowledge graphs. For example, knowledge graphs can be used to discover relationships between entities, thereby extending the associated entities of a single entity.
  • #Business entity set: tags related to a specific business, generated by the business itself and not controlled by humans. For example, if you want to target hotel search users or air ticket search users for marketing and expand business, you must better understand user needs and behaviors through in-depth analysis and mining of business entity tags, thereby optimizing business strategies and improving Conversion rates and user experience. The business entity set can be expanded through brand models, association rules, solution labeling platforms, etc., to obtain richer portrait labels or portrait users.
  • # Rule entity set: refers to labels generated based on specific rules or conditions. These tags are usually used by the product team based on their understanding of the business and using tag tools to select user groups that meet specific rules. For example, in the process of recommending itineraries or room types, some users may have purchased air tickets and hotels in Beijing. Then we can use these users with specific behavior chains as target groups for marketing promotion. Can be processed using relational entities and clustering algorithms. When performing clustering algorithms, it is important to note that you cannot only use rule labels for clustering, but other labels should be used. At the same time, you need to avoid mixing tags that are strongly related to rule tags with rule tags. In order to avoid this situation, the solution tag platform will provide correlation analysis between tags and other tags to help users filter out similar tags.
  • # Behavior entity set: tags generated based on user behavior. These tags develop corresponding marketing strategies by analyzing users' behavioral characteristics and demand types. For example, for users who have purchased Beijing air tickets and hotels, we can further analyze their behavioral characteristics, such as purchase time, frequency, preferences, etc., to develop more targeted marketing strategies.

2. Looklike algorithm based on knowledge graph and frequent patterns

Relying only on portrait tags for filtering may produce a large number of How to sort target users who do not meet their needs has become a difficult problem. Traditional methods such as sorting based on value, activity, etc., are difficult to ensure that the selected users are most similar to the target user group. Through knowledge graphs or frequent patterns, we can measure the similarity between users, and this similarity is quantifiable and scalable. Through the relationship level, the algorithm can more accurately find user groups similar to the target user.

Portrait tag system construction and application practice

3. Lookalike algorithm based on causal inference

Compared with traditional association rules and portrait tags, causal inference can solve Deeper question. Association rules and portrait labels mainly solve correlation problems, such as "users who buy beer may also buy diapers", but cannot explain why this correlation exists. This correlation may not hold true in different cultures and markets. Therefore, through causal inference through historical data and models, the key factors affecting user behavior and conversion can be found. These key factors can be found through relationship discovery, which in turn helps us better understand user behavior and business processes.

For example, the red part in the upper right corner filters out the parts that better reflect the business process through the understanding of the business, so as to expand more users.

Portrait tag system construction and application practice

4. The portrait of the object

Portrait tag system construction and application practice

In the portrait of the object In the process of constructing the portrait, we mainly focus on the attributes and characteristics of the objects, such as cities, business districts, routes, flights, etc. in the hotel portrait. These properties help us describe and understand objects more accurately and provide rich content for their portraits.

Portrait tag system construction and application practice

Compared with user portraits, object portraits emphasize the similarity between objects. In practice, we usually use the similarity of objects for operations such as recommendation and ranking. In order to measure the similarity between objects, various methods can be used, such as attribute vectors and embedding. These methods can represent objects as vectors and use these vectors to perform similarity calculations. It should be noted that although the process of building object portraits is similar to the process of building user portraits, in actual applications, we need to make appropriate adjustments and optimizations based on business needs and scenarios. At the same time, it is also necessary to conduct in-depth analysis of the relationships and hierarchical structures between objects to ensure that the portraits of objects accurately reflect business needs.

Portrait tag system construction and application practice

In addition, in the process of constructing the image of the object, we also need to pay attention to some key issues.

(1) Similarity does not mean similarity. For example, when using the embedding method, if high-value user groups are searching for five-star hotels, the correlation between these five-star hotels may be strong. But in some business scenarios, this correlation may not apply. Therefore, we need to carefully consider the similarity of objects based on specific business scenarios.

(2) Cold start problem. For example, in hotel profiling, when a new hotel comes online, it may lack user behavior data. In order to solve this problem, we can use attribute distance to extract large-dimensional label attributes, construct a user-friendly portrait label, and use this label to perform similarity calculations.

(3) Interpretability

Portrait tag system construction and application practice

##4. Portrait tag application scenarios

Application 1: Marketing crowd selection and diffusion

Portrait tag system construction and application practice# The ##portrait tag plays a vital role in the selection and diffusion process of marketing. By rationally using portrait tags, operators can conduct more detailed analysis and screening of the selected user groups. When operators feel that the initially selected user groups are too large or too small, or the marketing effect needs to be further expanded or optimized, It can be diffused or re-selected through portrait tags to achieve better marketing results.

However, when selecting and spreading portrait tags, the most common problems are the four quadrants of user conversion and operational intervention. These four quadrants respectively represent different user conversion states and operational intervention strategies, which require different responses to different situations. For example, for users with high conversion and low intervention, you can adopt strategies to maintain the status quo; for users with low conversion and low intervention, you can adopt strategies to promote conversion, etc.

The following are the four stages of marketing selection and diffusion during the application process of portrait tags:

Scientific analysis: Deeply dig into user data and accurately locate target groups to improve conversion effect.

Auxiliary circle selection: Use tags to efficiently filter target users and improve the pertinence and efficiency of marketing activities.

Intelligent expansion: Based on algorithms and models, intelligently classify and expand user groups to expand marketing coverage.

Model implementation: Combined with actual marketing activities, optimize portrait tags and strategies to achieve the best marketing results.

Portrait tag system construction and application practice

Application 2: Business indicator attribution analysis

Portrait tag system construction and application practice

Analyze through the portrait tag system The quality of business indicators and further optimize strategies. During the business iteration process, we usually use methods such as attribution analysis algorithms and business analysis to generate strategies. Then conduct experimental measurements. If the experimental strategy performs well, it will be fully launched.

Portrait tag system construction and application practice

Portrait tag system construction and application practice

However, two problems will be encountered in this process: how to analyze the quality of the indicators and the quality of the experimental results. In order to solve these problems, we need to conduct attribution analysis of business indicators. First, discover business problems through reports, alarms, etc., find out the causes of the problems, and clarify specific scenarios and actual transformation relationships. Next, locate the cause of the problem and determine whether the cause is controllable or uncontrollable. If it is uncontrollable, it may be a natural jitter and does not require too much attention; if it is controllable, it is necessary to further explore whether there are unknown scenarios that cause this problem.

In the qualitative analysis module, we will clarify controllable and uncontrollable factors, and explore the causes of problems in some unknown scenarios. Finally, suggestions are given to guide business personnel in what scenarios they should do it. This scenario actually means that the conversion rate of a certain business has dropped. Through the analysis process of the entire business, we can figure out the proportion of non-market factors and controllable factors. If market factors account for a large proportion, then we can solve the problem later without immediately using a lot of manpower and material resources.

Application Three: AB Experimental Performance Analysis

Portrait tag system construction and application practice

#In the process of being responsible for Qunar’s AB experimental system, we often face some challenge. When the product team invests a lot of time and resources to complete the experiment, if the experimental results are not significant, it is easy to have questions such as "Why the experiment is invalid" and "What is the direction of the next iteration?"

In order to solve these problems, we conducted an AB experimental performance analysis, which was mainly divided into three parts. First, we tried to determine whether the poor experimental results were due to insufficient volume improvement through the business process funnel model, core user portrait label identification, and business domain misleading label identification. Secondly, use analysis methods such as decision trees to explore whether there are problems with the qualitative improvement, such as conflicts in other experiments or situations where the improvement does not reach a significant proportion. Finally, quantify the action effectiveness and clarify the impact of each action on the goal.

Through these analysis processes, we can provide specific guidance to the product team to help them choose higher-efficiency directions for optimization, thereby achieving qualitative improvement. These analyzes not only help optimize product iteration directions, but also save resources and time for the company and improve overall business results.

5. Question and Answer Session

Q1: What is the difference between user behavior and business logs?

A1: User behavior data mainly records users’ interactive behaviors on the APP side, such as clicks, etc. These data mainly reflect the user’s interaction process. Business data involves various information processed in the background, such as agent connection processes, logistics information, etc. Although these data are invisible to users, they are also crucial to understanding the entire business process and improving user experience. In actual operation, we need to incorporate these data into our portrait tag system to better analyze and understand user behavior and business processes. For example, for e-commerce platforms, some data may not be relevant to users, but some involve user experience and business processes, so appropriate screening and processing are required.

Q2: How is streaming labeling currently done? Can it support more complex tag rules? Is it developed from data or configured visually?

A2: Streaming tags can be implemented through streaming computing, such as using tools such as Flink. Users can drag and drop defined data to calculate labels through streaming calculations. At the same time, you can also upload Python code or SQL code for customized calculations. In addition, it can also be supported through Spark and other methods. In streaming tags, the amount and time window of calculations need to be limited to meet different needs.

Streaming tags can support complex tag rules. Users can implement more complex label calculations by uploading Python code or SQL code.

Streaming tags can be implemented in two ways: data development and visual configuration. On the Qunar platform, users can drag and drop defined data to calculate labels through streaming computing, or upload Python code or SQL code for customized calculations.

Q3: What is real-time tags?

A3: Real-time tags refer to tags that are calculated and applied in real time when user behavior or business events occur. For example, when a user submits a complaint on the front-end interface, the system will analyze the user's demands and order issues in real time, and label the user with corresponding real-time labels. This kind of real-time labeling can quickly reflect user needs and problems for timely processing and optimization. Different companies have different definitions of real-time tags. For Qunar, anything within 3 seconds is considered real-time, while hours are considered a non-real-time scenario.

Q4: Does ID Mapping identify multiple mobile phone numbers/device numbers into a unique ID? Or does it allow each user to have a unique ID? For example, a mobile phone number has been logged in on two devices, and one of the devices has logged into another mobile phone number. Is it the only one or three?

A4: With the popularity of mobile Internet , more and more companies are beginning to use mobile phone numbers as unique identifiers for users. One-click login has become a common practice in the industry, making it easier for users to log in and use applications. For platforms like Qunar, we also use mobile phone numbers as unique user IDs. In most cases, we treat a mobile phone number as a unique identifier for a user. However, in some special cases, we will also consider the scenario where the user changes their mobile phone number and handle it accordingly. In addition, in order to better manage and identify users, when a mobile phone number is logged in on two devices, we will use a series of judgments to determine the user's holding status of the device. If the user logs into the device temporarily, we consider him/her to be a visitor; if the user holds the device for a long period of time, he/she is deemed to be a holder.

Q5: What are the application scenarios of product labels?

A5: The most common one is product pricing. In order to personalize product pricing, we need to use product tags. These labels are calculated based on specific numerical values ​​for internal and external factors. If internal factors are not properly sorted out, the impact of external factors may be exaggerated. can be understood as similar to a brute force solution. We put every factor in and try it, and then see how much influence each factor has on it, and judge whether it is correlation or causation in each factor. .

Q6: Do real-time business labels need to be customized and developed?

A6: After the real-time tags were built, we have tried our best to exhaust some real-time tags that can be obtained through basic statistics through the development level. As for real-time tags such as rules and models, they must be customized and developed.

Q7: How to manage the life cycle of tags?

A7: There will be some one-time tags at the beginning of the establishment, which will not be used after use.

Q8: Can some statistical methods be used to determine the minimum sample size for AB experiments? There is a standard calculation process for the AB experiment. Can we know the approximate sample size required to achieve a statistically significant effect?

A8: Smaller business companies may have insufficient traffic. If you want to achieve a minimum sample size, it is not possible to achieve it at the operational level, so we need to have some When the minimum sample size is not reached, the experimental effect can be quickly and roughly inferred.

Q9: How are the caliber types of user caliber portraits stored and displayed? In addition to single tags, user portraits also have multiple tags to form a user preference perspective. How to store these two types of tags better?

A9: Show that every company is different. From a storage perspective, Qunar has multiple storage methods. We can tolerate redundant storage of some data, mainly for fast real-time response. That is, when accessing tags, we try our best to use one Low time consuming to access it.

Q10: What are the applications of models in the construction of solution labels?

A10: In fact, through my current practice here at Qunar, large models are widely used in algorithm labeling. First, the simplest example. When we build user portraits, we often encounter POI landmark data. The landmark data is extracted from some documents. Maybe this is the large model used. The accuracy of this place is honest. Much better than some of the models we have built ourselves in the past. And when we build a knowledge graph, we will encounter some entity disambiguation, entity merging, etc.

Q11: Do profiling algorithm engineers also need to implement ranking recommendations?

A11: Actually no, this recommendation is to recommend engineers, but the recommendation algorithm needs to use the results of the portrait engineer. The portrait engineer needs to make a clear distinction between the quality of the portrait label and the application scenario. The description is so that recommendation sorting engineers can better use it.

The above is the detailed content of Portrait tag system construction and application practice. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete