Home >Technology peripherals >AI >Fudan releases 'News Recommendation Ecosystem Simulator' SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

Fudan releases 'News Recommendation Ecosystem Simulator' SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBforward
2023-06-12 11:57:431307browse

Understanding the evolution of online news communities is crucial to designing more effective news recommendation systems, but existing research is limited in understanding how recommendation systems influence community evolution due to the lack of appropriate data sets and platforms, leading to suboptimal system design that may affect long-term utility.

In response to this problem, the CISL research team of the School of Computer Science at Fudan University developed SimuLine, a news recommendation ecosystem evolution simulation platform.

SimuLine builds a latent space reflecting human behavior from real data based on pretrained language models (Pretrained Language Models) and inverse propensity score (Inverse Propensity Score), and then uses agent-based Model simulation (Agent-based Modeling) simulates the evolutionary dynamics of the news recommendation ecosystem.

SimuLine supports 100 rounds of creation-recommendation-interaction simulation for 10,000 readers and 1,000 creators on a single server (256G memory, consumer-grade graphics card), and also provides A comprehensive analytical framework that includes quantitative metrics, visualizations, and textual explanations.

Extensive simulation experiments show that SimuLine has great potential in understanding community evolution processes and testing recommendation algorithms.

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

##Author: Zhang Guangping, Li Dongsheng, Gu Hansu, Lu Tun, Shang Li, Gu Ning

Paper address: https://arxiv.org/abs/2305.14103

News recommendation ecosystem evolution simulation platform

With the popularity of social media (Social Media), people increasingly rely on online news communities to publish and obtain news. Every day, millions of news are published by content creators on various websites. This type of online news community is distributed by a recommendation system and is read by a large number of users.

With the production and consumption of news content, online news communities are in an ongoing process of dynamic evolution.

Like other types of online communities, the development of online news communities also conforms to the famous life cycle theory, that is, it goes through "start-up" - "growth" - "maturity" - " "decline" stage.

Through the perspective of life cycle theory, a large amount of research work has explored the evolution model of online communities and made suggestions for the operation of each stage in the life cycle.

However, as one of the most important technical infrastructures of online news communities, the impact of recommendation systems on the evolution of online news communities is still unclear.

In order to solve this mystery, the CISL research team from the School of Computer Science at Fudan University focused on the following three research questions and tried to find their answers through simulation experiments:

1) What are the characteristics of each stage of the life cycle of News Recommendation Ecosystems (NREs)?

2) What are the key factors driving the evolution of NREs, and how do these factors interact with each other to affect the evolutionary process?

3) How to achieve better long-term multi-party effectiveness through the design strategy of the recommendation system, thereby avoiding the community from falling into "decline"?

In order to answer these three research questions, the CISL research team developed SimuLine, a news recommendation ecosystem evolution simulation platform.

SimuLine first generates synthetic data based on real-world data sets. In order to solve the inherent exposure bias problem of the original data set (Exposure Bias), SimuLine introduced the inverse propensity score (Inverse Propensity Score) to eliminate the bias.

In order to establish a latent space that is close to the human decision-making process, SimuLine introduces pretrained language models (Pretrained Language Models) based on large-scale corpora to construct the latent space. Finally, SimuLine uses simulation based on agent models (Agent-based Modeling) simulates the behavior and interaction of users, content creators and recommendation systems in the news recommendation ecosystem.

Synthetic data generation

When trying to build a simulator that represents a user, the first question that comes to mind is "How should the user's various behaviors be?" Characterization?"

This problem actually has a very direct solution that is widely used in the field of recommendation systems, which is to build a latent space (Latent Space) and then combine the user's interests All news content is mapped to this space.

In this way, it is very convenient to measure the user's love for news through the similarity of vectors in the latent space, and then define a series of behavioral logic and rules.

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

##Build

So how to build What about this hidden space?

Some students said: "What's so difficult about this!? Isn't that what the recommendation algorithm is used to do! Wouldn't it be better to just learn one using the recommendation algorithm? "

This is indeed a method, but there are also some obvious problems.

The most puzzling thing for the CISC research team is a logical vulnerability called "Algorithm Confounding". That is to say, if recommendation algorithm A is used to build a latent space and map users and news as their real basis for behavioral decision-making, then wouldn’t Algorithm B used in the subsequent simulation process become fitting Algorithm A (will this look familiar to students who know some distillation learning)?

In addition, most of the current recommendation algorithms are still black box models. Even if you turn a blind eye and let Algorithm Confounding pass, there will be no problem when analyzing simulation data. You will be confused (this dimension is getting bigger, but what does this dimension represent???).

Just when the research team was at a loss, a white flash of light flashed across: It seemed that I had seen an article before saying that a language model trained based on a large-scale corpus (which was still Bert's at the time) The world, ChatGPT has not yet been born) can show some basic human cognition (that is, the famous King – male Female = Queen).

Wouldn’t this thing be very suitable for building latent space:

1. It can encode users and news;

2. By learning a global text representation from a large-scale corpus, the human cognition it embodies should be basic and universal, thus avoiding the problem of Algorithm Confounding;

3. Although it is not clear what each dimension in its latent space represents, this does not affect the understandability of this space. It can be provided for each point in the space through similar vector retrieval. A rough interpretation of the text.

This is simply wonderful! The decision is yours!

Mapping

##solve the hidden space The next step is to map users and news to this space.

News is easy to talk about. News must have rich text information and it can be encoded directly, but how should users deal with it? Is it possible to find an average using the news that the user likes in the history?

Can't!

The abominable Algorithm Confounding has changed its name and is here again. This time it is called Exposure Bias, which means that the user's like record does not necessarily fully reflect the user's interest, because the news that the user likes will definitely It is the news that the user has seen, and the news that the user has seen itself has been filtered by the recommendation system. There is a possibility that the user did not like it because he did not see it.

Fortunately, after so many years of rapid progress, the arsenal in the field of recommendation systems is sufficient. The research team found a handy weapon to solve this problem from the Unbiased Recommendation warehouse: Inverse Propensity Score (IPS).

To put it simply, it is to weight the recommended samples by estimating their exposure density, thereby offsetting the bias it brings during the model learning process, so that the user’s The encoding problem is solved.

As for the final content creators, their content publishing behavior is not interfered by Exposure Bias, and their historical records are directly weighted. In fact, after the above operations, the data preparation work has been basically completed, but there are still two shortcomings:

· First, the data scale has not been adjusted and may not be suitable for computing resources. (The little donkey pulls the big grinder/the big donkey grinds the foreign worker);

· Secondly, the user’s privacy is not respected. Therefore, the research team added a layer of generative model based on the user coding of the original data set.

Considering that news platforms are always designed with partition navigation (finance, sports, technology, etc.), and the gathering of users in various partitions is also obvious, the research team promoted Gauss Mixture models (GMM) are responsible for this task.

Agent Modeling

After completing the preliminary data preparation work, you can start modeling user behavior.

The research team adopted the Agent-based Modeling method, which is to model the behavior of individuals and the interactions between individuals, and then simulate the dynamics of the group by deploying a large number of Agents.

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations


## Briefly recall the user’s online news reading process (for example, if you read today headlines), the user will first see a series of news recommended by the recommendation system on a certain page, and then the user will simply browse the titles, pictures and summaries of each news. If a certain piece of news arouses the user's interest, they will Click in to see what is specifically said. After reading the news, if the user feels that the news is good, worth reading, or consistent with their own opinions, the user will express his or her opinion of the news through likes and other methods. recognition.

Definition

In this process, the interaction between users and news can be divided into three levels (Exposure, clicks and likes), where clicks and likes are active behaviors of the user and need to be defined in the User Agent.

Here the research team summarizes the user’s clicking behavior as a probabilistic selection behavior, that is, based on the matching degree between the user and the news (the similarity in the latent space of the two can be used degree measurement), users have a certain probability to select some news of interest to them from the list and click to read it.

This definition is more flexible than directly clicking on the most matching news. That is to say, it does not necessarily mean that a high matching degree will be read, and it is more in line with the real situation.

As for the behavior of likes, we cannot simply consider the matching degree of the news. After all, as we all know, the phenomenon of headline-grabbing is still common in the news.

Therefore, the research team introduced an abstract concept of "news quality" to generally represent the value of a news report. In this way, the user's like behavior can be subjectively Interest and objective quality are jointly characterized.

The research team adopted the expectation model to control the Agent's like behavior. Specifically, it first calculated the utility (Utility) of a user reading a certain news based on the degree of interest matching and news quality. If this utility If the user's expectations are exceeded (the research team uses a hyperparameter Threshold to represent the specific value of this expectation), then the like behavior will be triggered.

The intuitive explanation of this design is that if a piece of news makes me happy, whether it is because it caters to me or the report itself is very objective and comprehensive, I will not be stingy. Like him.

In addition, during the news reading process, the user’s interests or opinions are obviously not static.

For example, if a user sees a news report that they like very much, it may stimulate the user's desire to dig deeper into related news. On the contrary, if a report makes the user feel It is completely full of nonsense. If users see similar reports in the future, they will be less likely to click on them to see the details of the report.

This phenomenon was modeled by the research team as a User-drift Model.

Creative behavior modeling

Next modeling The creative behavior of news creators.

News creation in the real world will be affected by various factors. The research team simplifies it here as a greedy process, that is, the author always hopes that the news he creates will be more Approved by many readers.

The specific Agent behavior control research team adopts a solution similar to user clicks. Creators conduct probability sampling based on the likes of the news they created in the previous round and select new ones. A round of creative topics, and then news creation around the topics. The process of news creation is similarly modeled as a process of sampling from a topic-centered Gaussian distribution in the latent space.

In addition to the content of the news (latent space representation), the quality of the news also needs to be modeled. This is based on two basic assumptions that are consistent with the laws of reality:

1. There is a positive correlation between the author's number of likes and income, which means the more likes the author receives, the more likes the author receives. Income from reading, but as the number of likes increases, the income from a single like will gradually decrease;

2. Creators with high incomes will create better quality content due to their more sufficient budgets. High news. Based on this, a mapping function from the number of likes in the previous round to the quality of news in the next round can be constructed to control the quality of news creation.

Recommendation system modeling

Finally, the recommendation system behavior is modeled.

Algorithm recommendation and cold start recommendation are the two basic components of the news recommendation system. In order to provide personalized algorithm recommendations, the recommendation system first uses recommendation algorithms, such as BPR, etc., to learn the representation of users and news in the embedding space from historical interaction data (the research team uses latent space to refer to the large-scale language model encoding Real user interest space, using embedding space to refer to the space learned by the recommendation algorithm and used to generate the recommendation list).

However, due to the uncertainty of user like behavior and the limitation of the news validity window, algorithm recommendations cannot guarantee to cover all users. For this part of the gap, simple random recommendations can be used to fill this gap. Completion.

Due to the lack of historical interaction records, newly created news cannot participate in algorithm recommendations. SimuLine applies random recommendations and heuristic recommendation algorithms (such as new reports from historically favorite creators ) and other strategies to recommend cold start news.

In addition, SimuLine also supports other heuristic news recommendation strategies, such as breaking news, content creator-based promotion, and topic-based promotion.

All recommendation strategies have independent push quotas. The recommendation system combines news recommendations from all channels to form the final recommendation list.

Simulation experiment

The data is in place! The model has been built! What follows is an exciting experiment!

The research team selected the Adressa data set, which is widely used in the field of news recommendation. This data set provides the complete web logs of the Norwegian news website www.adressa.no for a certain week in February 2017. , compared with other excellent news recommendation data sets (such as Microsoft's MIND), it natively provides very critical news author information. Correspondingly, the language model uses BPEmb, which natively supports Norwegian. For more deployment details, you can refer to the first section of Chapter 4 in the paper.

So how to analyze the simulation results of SimuLine? SimuLine provides a comprehensive analysis framework from multiple perspectives for your reference.

The first is the most commonly used quantitative indicator evaluation system.

In order to fully reflect the evolution process of the news recommendation ecosystem, the research team summarized the quantitative indicators that have appeared in the existing literature and constructed a relatively complete set of evaluations from the following five aspects. System:

1) Interactivity, including the number of likes and its Gini index. A lower Gini index represents better fairness;

2) Coverage, including the number of users and news covered by algorithm recommendations;

3) Quality, including the average quality of news within the time limit and weighted by the number of likes The average quality of news during the time limit and the Pearson correlation coefficient between news quality and the number of likes;

4) Homogenization, including the Jaccard index among users, the higher the value, the higher the The higher the degree of overlap in news reading between users;

5) Matching degree, including the latent space representation cosine similarity between users and their favorite news.

1. Life cycle

The following three pictures The quantitative evaluation results of users, creators, and recommendation systems under different Agent hyperparameter conditions are shown respectively.


Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

#It can be seen that the simulation process and results are relatively stable under various hyperparameters, and are divided into approximately the tenth and twentieth rounds. boundaries (different indicators fluctuate to a certain extent), and the evolution of the system shows obvious stages (blue vertical lines are used to draw the rounds of stage transitions in the picture), which is consistent with the famous life cycle theory. consistent.

The first discovery was made from this: The online news community driven by the recommendation system naturally shows "startup" - "growth" - "maturity" under different user groups. & decline” life cycle.

#2. User differentiation

In addition to quantitative indicators, visualization is also important to assist in understanding the community evolution process tool.

The research team obtained the following snapshots of the system evolution process through PCA dimensionality reduction visualization (news is marked in blue, users with like records are marked in green, and users without like records are users are marked in red. The node size represents the number of likes/likes).

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

It can be seen that although the quantitative indicators show a multi-stage pattern, the evolution trend of latent space representation is is consistent, that is, users are gradually divided into in-the-loop users and out-the-loop users.

Users in the circle form a stable community with convergent interests, while users outside the circle show scattered interests.

In the evolution process between the 10th and 20th rounds, users have basically completed differentiation, which shows that the growth stage plays a crucial decisive role in user participation.

This leads to the second discovery: The online news community driven by the recommendation system will inevitably produce the convergence of community topics and lead to the differentiation of users, which determines user participation The critical period is the growth stage.

3. Interest assimilation

As mentioned earlier, since SimuLine builds the latent space through large-scale pre-trained language models, each vector in the space can be interpreted through similar word retrieval, which helps to pass cases Research to understand the evolution of individual users.

The research team randomly selected 3 users from in-circle users and out-of-circle users respectively. The following table shows the evolution of their interests.

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

For users in the circle, their interests are becoming more abstract, broad and general , for example, from "actor" to "job", from "Oslo" to "Norway" to "Europe". The evolution speeds of different users vary, but they all converge at the 50th round. This phenomenon reflects the gradual migration of users' preferences from personalized niche topics to trending topics widely discussed on the platform as a result of continuous interaction with the recommendation system.

For users outside the circle, their interests change slightly, but they are always focused on specific and personalized topics. For example, users No. 4 and No. 6 remained interested in "athletes", "tea" and "bills" respectively throughout the simulation process.

This leads to the third discovery: In an online news community driven by a recommendation system, the user’s personalized interests are assimilated in the process of continuous interaction with the recommendation system .

4. Start-up phase

With the help of the above quantitative indicators, visualization, With three powerful tools for text translation, SimuLine can conduct a comprehensive physical examination of the system's evolution process.

Since the evolution process of the online news community driven by the recommendation system is in line with the life cycle theory, let’s analyze how the community evolves at each life stage from the perspective of the life cycle.

First of all, let’s analyze the startup phase that roughly corresponds to the first 10 rounds.

Since the system is built from scratch, the recommendation system lacks data to train the recommendation algorithm in the initial stage. Correspondingly, at this stage, using random recommendations and heuristic recommendations to solve the user's cold start problem is the top priority.

Due to the inability to use more accurate algorithm recommendations, the recommendation results at this stage are often unsatisfactory in terms of interest matching. Therefore, the like behavior at this stage is mainly driven by news quality, reflecting In terms of quantitative indicators, there is a strong positive correlation between quality and popularity.

Going a step further, we can locate the two main driving forces of community evolution in the startup stage:

1) Quality feedback loops, that is, quality and popularity promote each other based on a positive correlation. That is, the better the thing, the more people will like it, and the more people will like it. Author The higher the income, the more motivated the author will be to produce better quality news reports;

2) Interest-quality confusion, that is, after accumulating enough information to accurately estimate user interest Before the amount of data, recommendation algorithms would confuse quality-driven like behavior into behavior triggered by user interest. These two driving forces promote each other, allowing popular content creators to gain gradually increasing over-exposure (reflected in the rise of the creator and news Gini index), and further squeeze the satisfaction of users' personalized interests (reflected in the user Decrease in the latent spatial similarity between its liked news). But most users can still benefit from enhanced news quality (reflected by the decreasing Gini index of user like behavior).

To summarize, the fourth finding can be obtained: In the startup phase, the system accumulates data for estimating user interests from random recommendations and high-quality news, and then solves the problem of cold Start user questions. Quality feedback loops and interest-quality confusion contribute to the emergence of extremely popular content creators through overexposure.

5. Growth stage

##With the accumulation of data, the recommendation algorithm’s estimate of user interest becomes more and more accurate, the like behavior gradually shifts from quality-driven to interest-driven, and the correlation between quality and popularity gradually weakens. As the number of simulation rounds increases, the news created during the startup period gradually expires and withdraws from recommendation candidates. The interest-quality confusion first begins to dissipate, and gradually leads to the final end of the quality feedback loop.

In the growth stage, the news density in the user area of ​​each circle is uneven. The density is higher in the direction of mainstream news topics, while the density in other directions is relatively low.

The result is that the news that users like is statistically more likely to be closer to mainstream news topics. This subtle deviation in like behavior continues to appear, and user interest continues to strengthen. Under the influence, it is gradually approaching mainstream news topics.

On the contrary, users outside the circle are stuck in the deadlock of "no likes - algorithm recommendations cannot cover - low recommendation accuracy - and even less likes". They will occasionally like the news because of its quality, but the recommendation algorithm cannot accumulate enough data within the data time limit to estimate their interest. Increases in news quality were spurred by more frequent and balanced like behavior, but news quality weighted by the number of likes remained generally stable as the popularity of high-quality news declined.

With the termination of the quality feedback loop, content creators can no longer receive excessive attention, resulting in a decline in news quality. Users who are sensitive to quality may stop liking it, leading to a decline in user coverage.

To sum up, we can get the fifth finding: In the growth stage, users in the circle evolve towards common topics under the influence of distribution deviation, while users outside the circle are stuck in a deadlock , leading to user differentiation. More and more accurate algorithm recommendations lead to the end of the quality feedback loop, and the community loses some quality-sensitive users as a result.

6. Maturity and decline stages

## Around round 20, the community enters the maturity and decline stage, when most key indicators stabilize.

At this stage, users in the circle dynamically stay in the bubble of common topics. Although their interests may shift to the edge of the bubble by clicking on some different news, they It will soon return to the center due to density difference.

The Gini index of likes for news is higher, while the Gini index of likes for content creators is lower. This shows that even if the news is created by the same creator, its popularity varies greatly. Big difference.

In addition to the greedy creation mechanism, the news creation process itself is highly random, so bubbles also show a natural expansion trend.

The expanding bubble brings more diverse news candidates, and also causes some users who are sensitive to topics to gradually withdraw.

The sixth discovery can be obtained from this: In the maturity and decline stages, users in the circle share common topics, and content creators publish various news around these topics. The community has maintained a stable and slow expansion, but at the same time it has also lost some users who are sensitive to interests.

7. How does evolution happen?

Discovery one to six answered the first research question that the research team focused on: News recommendation ecosystem (News What are the characteristics of each stage of the life cycle of Recommendation Ecosystems (NREs)?

Next, let’s put all the knowledge together and try to answer the second research question: What are the key factors driving the evolution of NREs, and how do these factors interact with each other to affect the evolutionary process? ?

The following figure summarizes the key factors and influencing mechanisms of the evolution of online news communities. From it, we can find that the re-emergence of exposure bias and deadlock is the main reason why users in the circle and those outside the circle are The direct cause of different evolutionary trends has further led to the differentiation of users and the convergence of topics.

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

The re-emergence of exposure bias is caused by a combination of factors.

First of all, from the perspective of information theory, the recommendation algorithm can be explained as a process of information compression, which inevitably leads to popularity bias, in which news that appears frequently in the data set ( That is, news with more likes) are encoded more effectively to improve recommendation performance. Reflected in the evolution process of the community, it is reflected that widely discussed common topics will seize the exposure resources of personalized topics on algorithm recommendation channels.

Secondly, due to the profit-seeking nature of content creators, they are more motivated to create news around topics of public interest, which will naturally lead to a shift in the density of news releases from mass topics to personalized ones. The topic is reduced. In this sense, even if random recommendations are used throughout the process, the community may develop in the direction of topic convergence due to distribution deviations.

Finally, filter bubbles and exposure bias promote each other, which together lead to a subtle shift in user interest. The algorithm recommends similar reports based on the news that users have liked in history. The limited news exposure makes the exposure bias more difficult for users to perceive.

In addition, the recommendation system’s bias towards popular news shows different impacts at different stages of evolution.

In the startup stage, there is interest-quality confusion, there is a strong correlation between news quality and popularity, and the popularity bias is specifically reflected in the enhancement of the exposure of high-quality news.

With the accumulation of data and the improvement of algorithm recommendation performance, like behavior is increasingly driven by interest compared with quality-driven, thus weakening interest-quality confusion and quality- Popularity correlation. Popularity bias has also gradually evolved from recommending high-quality news to simply recommending highly popular news.

In this process of conversion of old and new momentum, cultivating some highly popular and high-quality news topics plays an important role in promoting user participation.

To sum up, we can get the seventh discovery: Popular bias, news distribution bias and filter bubbles jointly lead to exposure bias, which affects user differentiation and topic convergence. The key factor. High-quality news with high popularity is crucial to breaking the deadlock among users outside the circle.

8. How to avoid community decline?

Finally, with the help of SimuLine’s powerful simulation and analysis capabilities, we will explore the third research question: How Through the design strategy of the recommendation system, can we achieve better long-term multi-party effectiveness and avoid the community from falling into "decline"?

The research team tested four of the most basic and common heuristic recommendation methods: subscription-based news cold start, hot search list, topic promotion and creator promotion. The following three figures present the community evolution results of applying the above four methods on the basic recommendation system.

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

Fudan releases News Recommendation Ecosystem Simulator SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations

# (1) The subscription-based news cold start attempts to form a stable cross-round exposure relationship between users and content creators, thereby enhancing the quality feedback loop that occurs in the startup stage.

However, this approach has led to a serious monopoly. Content creators who have not achieved first-mover advantage will be suppressed by the quality feedback loop, destroying algorithm coverage and the average quality of news. In turn, the ecological diversity of the entire community is seriously challenged.

(2) Hot search list is the most common online community component, relying on the positive correlation between news quality and popularity. This method can provide users with higher quality News recommendations. At the same time, from the perspective of exploitation and exploration, reading breaking news can also be regarded as a kind of user exploration that breaks through the limitations of the user's existing interests, which helps to reduce the negative impact of filter bubbles.

However, this approach cannot prevent the collapse of the correlation between popularity and quality discussed in the previous article, which can lead to a decrease in the effectiveness of recommending breaking news.

(3) Finally, there is platform promotion. By providing additional exposure to specific topics or specific authors, the platform can also actively regulate the recommended content. Promoting content creators can build a stable exposure relationship, and then use a quality feedback loop to cultivate high-quality news with high popularity.

But unlike the news cold start strategy based on subscription, promotion can be actively terminated before the current quality feedback loop cultivates a harmful monopoly, thus ensuring user experience and creation. the creativity of the person. As a news dissemination channel independent of interest matching, it can also mitigate the negative effects of filter bubbles. Furthermore, by reconstructing the quality feedback loop, it also directs the recommendation system's bias toward popular news toward beneficial recommendations toward high-quality news.

SimuLine randomly selects topics in experiments targeting specific topic promotion, which means that popular topics and personalized topics have the same chance of being promoted, so for personalities with relatively low exposure topic, the impact of promotion is relatively large.

This method can theoretically be used to increase the participation of users outside the circle. However, since the quality of promoted news cannot be guaranteed, the amount of exposure is difficult to convert into the number of likes, resulting in this method. The effect is limited.

To sum up, we can get the eighth finding: Among the common recommendation system design strategies, periodic promotion for content creators is the most effective. By actively building a quality feedback loop, it can create waves of popular and high-quality news topics throughout the community, while the platform can control monopoly through regular resets.

Summary

In this article, the CISL research team designed and developed SimuLine, a simulation platform for analyzing the evolution process of the news recommendation ecosystem, and A detailed analysis of the evolution process of online news communities was conducted based on SimuLine.

SimuLine constructs an understandable latent space that well reflects human behavior, and based on this, conducts a detailed simulation of the news recommendation ecosystem through agent-based modeling .

The research team analyzed the entire life cycle of the evolution of online news communities, including startup, growth, maturity and decline stages, analyzed the characteristics of each stage, and proposed a relationship diagram to illustrate the evolution process. Key factors and influencing mechanisms.

Finally, the research team explored the impact of recommendation system design strategies on community evolution, including the use of subscription-based news cold start, hot news and platform promotion.

In the future, the CISL research team will consider text content generation for news and behavioral modeling of social network activities to conduct more powerful and realistic simulations.

The research team believes that SimuLine can also be used as a great tool for recommender system evaluation, providing a third option besides online user experiments and offline experiments based on data sets (this is also a great tool for it The main reason for the name SimuLine).

The research team also noticed that the recommendation system research community has recently proposed a series of bias-correcting recommendation algorithms, aiming to deal with the exposure bias problem in recommendations, which is also a factor in user differentiation and topic convergence. direct cause.

Since this article focuses on discussing the system design of the recommendation system rather than the specific recommendation algorithm, the research team leaves this issue as an open topic and hopes that SimuLine can promote future research in this direction. Research.

The above is the detailed content of Fudan releases 'News Recommendation Ecosystem Simulator' SimuLine: a single machine supports 10,000 readers, 1,000 creators, and 100+ rounds of recommendations. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete
Previous article:Can ChatGPT design bots?Next article:Can ChatGPT design bots?