Home >Common Problem >pagerank algorithm
PageRank, that is, web page ranking, also known as page level, Google left ranking or Page ranking.
is a link analysis algorithm proposed by Google founders Larry Page and Sergey Brin when building an early search system prototype in 1997. Since Google After achieving unprecedented commercial success, the algorithm has also become a computational model of great concern to other search engines and academic circles. Many important link analysis algorithms are derived from the PageRank algorithm. PageRank is a method used by Google to identify the level/importance of web pages. It is the only criterion used by Google to measure the quality of a site.
(Recommended learning: PHP video tutorial)
After combining all other factors such as Title logo and Keywords logo, Google adjusts the results through PageRank so that Pages with more "rank/importance" will improve the site ranking in the search results, thus improving the relevance and quality of the search results. Its levels range from 0 to 10, with level 10 being a perfect score. The higher the PR value, the more popular (more important) the page is. For example: a site with a PR value of 1 indicates that the site is not very popular, while a PR value of 7 to 10 indicates that the site is very popular (or extremely important). Generally, if the PR value reaches 4, it is considered a good site. Google sets the PR value of its own site to 10, which shows that Google's site is very popular, and it can also be said that this site is very important.
Before PageRank was proposed, some researchers had already proposed using the number of incoming links to a web page to perform link analysis calculations. With this linking method, if a web page has more incoming links, the more important the web page is. Many early search engines also adopted the number of incoming links as a link analysis method, which also had a significant effect on improving search engine performance. In addition to taking into account the impact of the number of incoming links, PageRank also refers to web page quality factors. The combination of the two provides a better evaluation standard for web page importance.
For an Internet webpage A, the calculation of the PageRank of the webpage is based on the following two basic ifs:
Quantity if: In the Web graph model, if a page node receives the number of incoming links pointed to by other webpages The more, the more important this page is.
Quality If: The quality of the incoming links pointing to page A is different, and high-quality pages will transfer a lot of other weights to other pages through links. Therefore, the more high-quality pages point to page A, the more important page A is.
Using the above two ifs, the PageRank algorithm initially gives each web page the same importance score, and updates the PageRank score of each page node through iterative recursive calculation until the score is stable. The result calculated by PageRank is the importance evaluation of the web page, which has nothing to do with the query entered by the user, that is, the algorithm is topic-independent. If there is a search engine whose similarity calculation function does not consider content similarity factors and completely uses PageRank for ranking, what will the performance of this search engine look like? This search engine returns the same results for any different query requests, that is, it returns the page with the highest PageRank value.
PageRank algorithm principle
The calculation of PageRank makes full use of two ifs: quantity if and quality if.
The process is as follows:
In the initial stage: the web page builds a Web graph through link relationships, and sets the same PageRank value for each page. After several rounds of calculations, Will get the final PageRank value obtained by each page. As each round of calculation proceeds, the current PageRank value of the web page will be continuously updated.
Calculation method of PageRank score for updated pages in one round: In the calculation of PageRank score for updated pages in one round, each page will evenly distribute its current PageRank value to the outgoing links included in this page, so that Each link obtains a corresponding weight. And each page will sum up the weights passed in from all the incoming links pointing to this page to get a new PageRank score. When each page has obtained the updated PageRank value, a round of PageRank calculation is completed.
Basic idea:
Assuming that web page T has a link to web page A, it means that the owner of T thinks A is more important, so Assign part of T's importance score to A. The importance score value is: PR(T)/L(T)
where PR(T) is T’s PageRank value, L(T) is T’s number of outgoing links, and A’s The PageRank value is the accumulation of a series of page importance score values similar to T.
That is, the number of votes a page gets is determined by the importance of all pages linking to it. A hyperlink to a page is equivalent to casting one vote for the page. The PageRank of a page is obtained through a recursive algorithm based on the importance of all pages that link to it (linked pages). A page with more links will have a higher rank, whereas if a page has no links at all, it will have no rank.
For more PHP related technical articles, please visit the PHP Graphic Tutorial column to learn!
The above is the detailed content of pagerank algorithm. For more information, please follow other related articles on the PHP Chinese website!