Home >Technology peripherals >AI >Technical evolution of real-time large model for Weibo recommendation

Technical evolution of real-time large model for Weibo recommendation

王林
王林forward
2023-05-02 18:34:151034browse

Technical evolution of real-time large model for Weibo recommendation

##1. Technical route review

1. Business scenarios and characteristics

The recommendation business that this team is responsible for in the Weibo APP mainly includes:

① The content of all tab columns under the homepage recommendation, information flow products are generally The first tab has a relatively high proportion of traffic;

#② Hot searches are an information flow that slides down. This is also our business scenario, including this Other information flow tabs on the page, such as video channels, etc.;

③ Search or click recommended videos in the entire APP to enter the immersive video scene.

Technical evolution of real-time large model for Weibo recommendation

Our business has the following characteristics:

(1) First of all, from the perspective of recommended implementation:

① There are many business scenarios.

② Users on Weibo UI have diverse operations and feedback. Content can be viewed by clicking on the text page or consumed within the stream. Feedback within the stream is diverse. For example, click to enter the blogger's personal page, click to enter the text page, click on pictures, click on videos, click on comments and likes, etc.

③ There are many types of materials that can be distributed. For example, long pictures, pictures (one picture or multiple pictures), and videos (horizontal or vertical) can be distributed as recommended on the homepage. videos), articles, etc.

(2) From the perspective of product positioning:

① Service hot spots: Weibo's traffic changes are particularly large before and after hot spots break out. Users can smoothly consume hot content in the recommendations, which is the company's requirement for recommended products.

② Build relationships: I hope to accumulate some social relationships in the recommended Weibo.

2. Technology selection

The following figure shows our technological progress in recent years.

Technical evolution of real-time large model for Weibo recommendation

In terms of the current recommendation framework, 100 billion features and 100 billion parameters are standard. Different from NLP and CV, models that are too large in these two directions have high complexity of the network itself, good sparsity in recommended scenarios, relatively large model size, and training often uses data parallelism, and each user does not need serving All model parameters.

The technical evolution of this team from 2018 to 2022 is mainly in two aspects: large-scale and real-time. On this basis, make complex structures to achieve twice the result with half the effort.

#Here is a brief introduction to our Weidl online learning platform.

The main process is: User behavior splicing samples are used for the model to learn, and then recommendations are given to users for feedback. The overall design adopts a data flow priority principles to achieve better flexibility. No matter what method is used to train KERNEL, the real-time update part between offline model storage and online PS still exists. Whether it is hand-written LR or FM, or Tensorflow, or DeepRec training model, it is possible. The corresponding model storage is a set of data streams built by ourselves, and the model format is also made by ourselves, thus ensuring that multiple Backends can be downloaded from The model training can be updated online in less than a minute, and the new parameters can be used the next time the user calls it. Under this design principle, Backend can be easily switched.

Weidl is Weibo’s self-developed machine learning platform. The Bridge mode can call operators of various deep learning frameworks, or you can replace it with its own It is also very convenient to calculate the operator. For example, when we used Tensorflow before, we would perform some memory allocation and operator optimization on tf. We will switch to DeepRec in the second half of 2022. After learning more about DeepRec, we will find that some of the previous performance optimization points based on tf are similar to those of DeepRec. of.

The following figure lists some versions made by our team over the years to facilitate everyone’s understanding of the contribution of each technical point in our business. First, use FM's model solves the problem of large-scale real-time recommendation, and later creates complex structures based on depth. Judging from the results, the previous use of non-depth models to solve online real-time problems has also brought great benefits.

Technical evolution of real-time large model for Weibo recommendation

Information flow recommendations are different from product recommendations. Information flow recommendations are basically large-scale Real-time deep architecture. There are also some difficulties and differences in this area. For example, real-time features are not an alternative to real-time models. For recommendation systems, what the model learns is more important. In addition, online learning does bring some iteration problems. But before absolute gains can be made, it can be overcome with time.

Technical evolution of real-time large model for Weibo recommendation

2. Recent technology iteration of large models

This chapter will introduce the business iteration model from the aspects of goals, structure and characteristics.

1. Multi-objective fusion

#There are many user operations in the Weibo scenario, and users will express their love for the item. There are many kinds of behaviors, such as click interaction, duration, drop-down, etc. Each goal must be modeled and estimated, and finally the overall integration and ranking are very important for the recommendation business. When it was first done, it was done through static fusion and offline parameter search. Later, it was turned into dynamic parameter search through the reinforcement learning method. After that, some fusion formula optimization was done, and later it was improved to output some fusion scores through the model. wait.

Technical evolution of real-time large model for Weibo recommendation

##The core method to strengthen parameter adjustment is, divide the online traffic into some small traffic pools, use some current online parameters to generate some new parameters, see the user's reaction to these parameters, collect feedback and iterate. The core part is the calculation of reward, which uses CEM and ES. Later, a self-developed algorithm was used to adapt to its own business needs. Because online learning changes very quickly, big problems will arise if the parameters cannot change accordingly. For example, everyone’s preference for video content changes from Friday night to Saturday morning and Sunday night to Monday morning. Changes are very fast, and changes in the entire fusion parameters must reflect changes in user preferences for something.

Technical evolution of real-time large model for Weibo recommendation

The following are some small tricks in model optimization. Users use it cyclically every day. It is better to make regular init correction every day, otherwise it may go to a biased branch; parameter initialization It is necessary to obey the prior distribution, first perform a priori analysis, and then perform differential fusion; add an anomaly detection mechanism to ensure that the fusion parameters can be updated iteratively and consistently.

Technical evolution of real-time large model for Weibo recommendation

The fusion formula initially used additive fusion. At that time, there were not so many business goals. As the number of goals increases, it is found that additive fusion is inconvenient to support adding more goals and will weaken the importance of each sub-goal. The multiplicative fusion formula is used later. The effect is shown in the ppt:

Technical evolution of real-time large model for Weibo recommendation

##After the full version is upgraded to multi-tasking, here The version is optimized to perform target fusion through the model. Through model fusion, we can better capture many non-linear things and have better expressive power. In this way, we can also achieve personalized fusion, and each user will get different things.

Technical evolution of real-time large model for Weibo recommendation

2. Multi-tasking

Multi-tasking is from 2019, A concept that became popular in 2020, recommendation systems often need to focus on multiple goals at the same time. For example, there are seven goals in our business scenario: click, duration, interaction, completion, negative feedback, entering the homepage, pull-down refresh, etc. Training a model for each target consumes more resources and is cumbersome. Moreover, some targets are sparse and some are relatively dense. If the models are built separately, those sparse targets are generally not easy to learn well. Learning together can solve the problem of sparse target learning.

Technical evolution of real-time large model for Weibo recommendation

Recommended introduction to multi-task modeling generally starts with MMOE, then SNR, then DMT, and finally to full volume MM actually optimizes the SNR by merging the network and so on.

Technical evolution of real-time large model for Weibo recommendation

Before doing multitasking, key issues to be solved include: Between multiple targets Whether there are conflicts among the various losses, whether there will be a seesaw effect between each other; the problem of inconsistent sample space; the problem of loss balance, etc. In actual experience, both PCGrad and UWL methods will show their effects in test data. However, if they are enlarged to a production environment and continued online learning and training, the effects of these methods will decay faster. On the contrary, according to experience It is not impossible to set some values ​​in the entire online internship environment. I am not sure whether this is related to online learning or to the sample size. The effect of doing MMOE alone is also relatively good. On the left are some actual profit points in the business.

Technical evolution of real-time large model for Weibo recommendation

The following are some technological evolutions starting from MMOE. The beginning of multitasking is usually simple hard connection, followed by MMOE, then SNR or PLE. These are relatively mature methods in the industry in recent years. This team uses SNR and performs two optimizations. In the lower half of the figure below, the leftmost is the approach of the SNR standard paper. We have simplified the transformation within the expert. At the same time, there will be exclusive experts and shared experts. Here, some analysis will be done based on the actual values ​​and estimated deviations of the data conclusions fed back in some actual business, and some independent experts will be made.

Technical evolution of real-time large model for Weibo recommendation

3. Multi-scenario technology

#We are responsible for many recommendation scenarios, so it is natural to think of using some multi-scenario technologies. Multi-tasking means that some targets are relatively sparse. Multi-scene means that the scenes are large and small. The convergence of small scenes is not so good because the amount of data is insufficient, while the convergence of large scenes is better. Even if the two scenes are about the same size, there will be a gap in the middle. Some of them involve knowledge transfer that will benefit the business. This is also a hot trend recently and has many technical similarities with multitasking.

Technical evolution of real-time large model for Weibo recommendation

##Based on each multi-task model, multiple scene models can be made, compared toIn terms of the multi-task structure, what is added is the Slot-gate layer in the figure below. The same Embedding uses Slot-gate to express different functions for different scenarios. The output through Slot-gate can be divided into three parts: connecting to the expert network, connecting to the target task, or connecting to features.

Technical evolution of real-time large model for Weibo recommendation

The main model mainly uses SNR to replace CGC, which is in line with the iteration of multi-tasks. The following is the current application of multi-tasking and multi-scenarios mixed together in two internal business scenarios: hot and popular. Among them, the homepage recommendation is a popular stream, and the discovery page recommendation is a hot stream.

The overall structure is similar to SNR, with three goal towers on top: click, interaction and duration. These three target towers are divided into six targets for two popular and hot-spot scenes. In addition, the Embeding transform layer is added. Different from Slot-gate, Slot-gate is to find the importance of features, while Embeding transform layer is to consider the difference in embedding space in different scenarios to perform embedding mapping. Some features have different dimensions in the two scenes and are transformed through the Embedding transform layer.

Technical evolution of real-time large model for Weibo recommendation

4. Interest representation

Interest Characterization is a technology that has been mentioned a lot in recent years. From Alibaba's DIN to SIM and DMT, it has become the mainstream of user behavior sequence modeling in the industry.

Technical evolution of real-time large model for Weibo recommendation

The DIN used at the beginning constructs multiple behavior sequences for different behaviors. The attention mechanism was introduced to give different weights to different materials in the behavior, and the local activation unit was used to learn the weight distribution of the user sequence and the current candidate sorted materials, realizing a popular fine-ranking solution and achieving certain business benefits.

The core of DMT is to use Transformer on multitask. Our team used a simplified DMT model, removed the bias module, replaced MMoE with SNR, and went online Finally, certain business results were achieved.

Technical evolution of real-time large model for Weibo recommendation

Multi-DIN expands multiple sequences and uses the mid, tag, authorid, etc. of the candidate material as query. After separately paying attention to each sequence to obtain the representation of interest, other features are spliced ​​into the multi-task ranking model.

Technical evolution of real-time large model for Weibo recommendation

#At the same time, we also conducted experiments and found that we can make the sequence longer, such as the click, duration, and interaction sequences. etc., the effect is better when each sequence is expanded from 20 to 50, which is consistent with the conclusion in the paper, but longer sequences require more computing power costs.

Technical evolution of real-time large model for Weibo recommendation

User life cycle ultra-long sequence modeling is different from the previous long sequence modeling. Data cannot be pulled by requesting features. , but to construct the user's long behavior sequence features offline; or to find the corresponding features through some search methods and then generate embedding; or to model the main model and the ultra-long sequence model separately, and finally form the embedding and send it to the main model middle.

In the Weibo business, the value of ultra-long sequences is not that great, because everyone’s focus changes quickly on the Internet, such as hot searches, It gradually fades away in a day or two, and the content from seven days ago in the information flow is distributed less. Therefore, a user behavior sequence that is too long will weaken the estimated user preference value for the item to a certain extent. But for low-frequency or return users, this conclusion is different to a certain extent.

Technical evolution of real-time large model for Weibo recommendation

5. Features

Use oversized Large-scale models will also have some problems at the feature level. For example, some features are theoretically thought to be helpful to the model, but the effect after adding them does not meet expectations. This is also the reality faced by the recommendation business. Because the scale of the model is very large, a lot of id class information has been added to the model, which has already given a good expression to some user preferences. Adding some statistical features at this time may not be so easy to use. Let’s talk about this team Features that are relatively easy to use in practice.

First of all, the matching feature effect is relatively good. Users can create some more detailed statistical data for a single material, a single content type, and a single blogger. Can bring some benefits.

Technical evolution of real-time large model for Weibo recommendation

In addition, multi-modal features are also more valuable, because the entire recommendation model is based on user behavior, and there are some low-frequency and unpopular ones. Item user behavior is insufficient in the entire system. At this time, introducing more prior knowledge can bring more benefits. Multimodality introduces a batch of semantics through the introduction of NLP and other technologies, which is helpful for both low frequency and cold start.

This team has done two types of methods to introduce multi-modal features: the first type is to integrate multi-modal embedding into the recommendation model, freeze the gradient of these embeddings at the bottom, and then move to the upper MLP Update; another method is to use multi-modality to perform clustering before entering the recommendation model, and throw the cluster ID into the recommended model for training. This is an easier way to introduce information to the recommendation model, but Some multi-modal specific semantic information will also be lost.

We have tried many of the above two methods in our business. The first method will increase the complexity of the model and requires a lot of spatial transformation, finding feature importance, etc., but it can Brings good benefits; the second method uses cluster IDs to learn, the complexity is outside the model, the online service is relatively simple, the effect can reach about 90%, and you can also do some statistics on cluster IDs Sexual characteristics, combined to great effect.

Technical evolution of real-time large model for Weibo recommendation

After adding multi-modal features, the biggest benefit is high-quality low-exposure materials, which can solve the problem of cold start question. Recommending materials that have relatively little exposure and the model cannot fully learn will rely heavily on multi-modal bodies to bring more information, which is also of positive value to the business ecology.

Technical evolution of real-time large model for Weibo recommendation

Co-action’s motivation is to try the intersection of deepfm, wide deep and other features The method is fruitless, and it is suspected to be caused by the conflict between cross-features and DNN partial shared embedding. Co-action is equivalent to adding storage and opening up separate storage space for cross-over. This increases the expression space and also makes good profits in the business.

Technical evolution of real-time large model for Weibo recommendation

3. Link expression consistency

This part is about rough sorting and recall. For the recommendation business, although the computing power cannot support the fine sorting of millions of candidate sets, and they are divided into recall, rough sorting, and fine sorting, the logic is the same issue. For example, as shown in the figure below, rough sorting will be truncated, and the final content for fine sorting will only be about 1,000. If the expressions of rough sorting and fine sorting are significantly different, the fine sorting score will probably be higher in the future during the truncation process. The content is truncated. The features and model structures of fine sorting and rough sorting are different. Rough sorting is generally similar to the recall framework, which is an approximate structure of vector retrieval. Features will cross over later, and it is natural to express differences with the fine sorting model. If consistency can be improved, business indicators will also rise because both parties can capture the same changing trends.

Technical evolution of real-time large model for Weibo recommendation

The following figure shows the technical context of the rough consistency iteration process. The above is The technical line of Twin Towers, below is the technical line of DNN. Since the features of the Twin Towers interact relatively late, many ways for the features of the Twin Towers to intersect have been added. However, the ceiling of the vector retrieval method is a bit too low, so starting from 2022, there will be a DNN branch for rough sorting, which will put greater pressure on the engineering architecture, such as feature screening, network pruning, performance optimization, etc., and The number of items scored at one time will also be reduced than before, but the scores are better, so the smaller number of items is acceptable.

Technical evolution of real-time large model for Weibo recommendation

##DSSM-autowide is a crossover similar to Deep-FM based on Twin Towers, with There was an increase in business indicators, but for the next project, using a new cross-over method, the improvement was not so significant.

Technical evolution of real-time large model for Weibo recommendation

Therefore, we feel that the benefits we can make based on the twin towers are relatively limited. We also tried a rough multi-task model based on the twin towers, but we still couldn't get around the twin towers problem.

Technical evolution of real-time large model for Weibo recommendation

Based on the above problems, this team optimized the rough model, using DNN and level The joint model is a Stacking architecture.

The cascade model can be filtered first with the twin towers, and then filtered and truncated to the DNN model for rough sorting, which is equivalent to doing rough sorting and fine sorting inside the rough sorting. Row. After switching to a DNN model, it can support more complex structures and more quickly adapt to changes in user interests.

Technical evolution of real-time large model for Weibo recommendation

Cascade plays an important role in the framework. Without the cascade model, it is difficult to follow the The smaller candidate set is selected from the larger candidate set to be used by the rough DNN. The more important thing in the cascade is how to construct the sample. You can see the figure below. From the million-level material library, we recall thousands of rough sorting and fine-sorting materials within 1,000. Finally, about 20 items were exposed, and the number of users' actions was in the single digits. The overall process was from a larger library to the user. Behavioral funnel process. When doing cascade, the core point is that each part must be sampled to form some difficult pairs and relatively simple pairs to learn from the cascade model.

Technical evolution of real-time large model for Weibo recommendation

The following figure shows the benefits brought by cascade optimization and global negative sampling. Give a detailed introduction.

Technical evolution of real-time large model for Weibo recommendation

# Next, we will introduce the cause-and-effect inference that has become popular recently.

Our motivation for using causal inference is that if we push something that everyone likes, the user’s click effect will be good, but the user himself There are also some relatively niche interests. We recommend these niche materials to users, and users also like them more. These two things are the same for users, but for the platform, the more niche things that can be introduced are more personalized, and the first type is easier to be derived by the model. Causal inference is to solve this problem. kind of problem.

The specific method is to group pairwise sample pairs. For materials that are clicked by users and have low popularity, and materials that are highly popular but not clicked by users, use Bayesian method is used to train the loss model.

#In our practice, it is easier to gain benefits by doing causal inference in the rough sorting and recall stages than in the fine sorting. The reason is that the fine ranking model is relatively complex. Fine ranking already has good personalization capabilities. However, even if coarse ranking and recall use DNN, they are also cropped DNNs. There is still a gap in the personalization ability of the entire model. The personalization ability is relatively poor. The effect of using causal inference in places is definitely more obvious than using it in places with strong personalization capabilities.

Technical evolution of real-time large model for Weibo recommendation

4. Other technical points

1. Sequence Rearrangement

Rearrangement uses the beam-search method, designs the reward function combined with the NEXT drop-down model, generates a variety of candidate sequences, selects the sequence with the greatest profit, and after expansion The effect is unstable and details are being further optimized.

Technical evolution of real-time large model for Weibo recommendation


##2. Graphic technology

Graph technology mainly includes two parts: graph database and graph embedding. For recommendations, it will be more convenient and cheaper to use a graph database. Graph embedding refers to the process of random walking of nodes of the walking class, mapping graph data (usually high-dimensional dense matrices) into low-dimensional dense vectors. Graph embedding needs to capture the topological structure of the graph, the relationship between vertices, and other information (such as subgraphs, edges, etc.), which will not be introduced here.

Technical evolution of real-time large model for Weibo recommendation

Algorithms based on random walk, graph structure, graph contrast learning and other algorithms can be used in recommendations to make user and blog posts, Recall of interaction/attention between users and authors. The mainstream method is to embedding images, text, users, etc., and adding features to the model. There are also some more cutting-edge attempts, such as directly building an end-to-end network and using GNN for recommendations.

Technical evolution of real-time large model for Weibo recommendation

The picture below is the current end-to-end model. We are still trying it. No Online mainstream version.

Technical evolution of real-time large model for Weibo recommendation

The following picture is based on the graph network to generate embedding, and the picture on the right is based on the domain of the account Calculated similarity. For Weibo, it is profitable to calculate the embedding based on the attention relationship.

5. Question and Answer Session

Q1: There are many items in the recommended information flow, but only browsing without clicking. How to distinguish whether you are interested or not? Through the dwell time of Item on the list page?

#A1: Yes, when it comes to information flow business, duration is a more important optimization indicator. It is not convenient to directly optimize how long users stay on the APP as a whole today when using duration optimization indicators. What is more optimized is how long they stay on items. If duration is not regarded as an optimization goal, it will be easier to promote a lot of shallow consumption content.

#Q2: Will there be consistency issues in real-time update of the model if a fail over occurs during training? How to deal with model consistency issues?

A2: Currently, for the recommended learning and training, if it is a CPU, there are more asynchronous ones, and we don’t tend to make global rounds, etc. After the round is over, collect them together, update them on PS, and then initiate the next round. Because of performance issues, people basically won't do this. Regardless of whether it is real-time or online learning, strong consistency cannot be achieved.

If a fail over occurs during your training, if you are doing streaming training, it will be recorded on the data stream, such as kafka or flink, to record your current Wherever the plan is trained, your PS also has the record of your last training, which is similar to the global difference.

#Q3: Will using fine sorting order for recall lower the iteration limit of the recall model?

A3: The upper limit of iteration can be understood as the ceiling of recall. Then I understand that the ceiling of recall is definitely not to exceed fine ranking. For example, if the computing power is infinite now , then using fine sorting to score 5 million materials is the best way to handle the business. When the investment in the recall is not that big, try to find out for him the best parts of the fine selection. For example, let him select the top 15 from the 6,000 in the recall and the top 15 from the 5 million, which is relatively close. The recall module does a better job. If everyone understands this, then recalling the order of fine sorting will not reduce the iteration online, but will move towards the upper limit. However, this is also our opinion. Depending on your own business orientation, the conclusion may not be universally applicable.

The above is the detailed content of Technical evolution of real-time large model for Weibo recommendation. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete