Home >Web Front-end >HTML Tutorial >You just bought a piece of clothing on Taobao - detailed analysis of the technical process (the process of displaying a page on Taobao's homepage)_html/css_WEB-ITnose

You just bought a piece of clothing on Taobao - detailed analysis of the technical process (the process of displaying a page on Taobao's homepage)_html/css_WEB-ITnose

WBOY
WBOYOriginal
2016-06-24 11:48:181362browse

Statement: The article is a document that I saw on the Internet before and saved in my computer. The original address cannot be found. Statement here. Salute to Internet engineers!

You realize that the Chinese New Year is coming soon, so you want to buy a sweater for your girlfriend, so you open www.taobao.com. At this time, your browser first queries the DNS server and converts www.taobao.com into an IP address. But first you will find that when you are in different regions or different networks (Telecom, China Unicom, China Mobile), the converted IP address is likely to be different. This first involves the first step of load balancing, through When DNS resolves domain names, it allocates your access to different entrances and tries to ensure that the entrance you visit is the fastest possible one among all entrances (this is different from the CDN mentioned later).

You have successfully accessed the actual entrance IP address of www.taobao.com through this entrance. At this time, you generate a PV, namely Page View, page visit. The total daily PV volume of each website is an important indicator to describe the size of a website. The PV of the entire Taobao network on weekdays (non-promotion periods) is between 1.6 and 2.5 billion. At the same time, as an independent user, all the pages you visit on Taobao this time are counted as a UV (Unique Visitor user visit). The recently infamous 12306.cn’s daily PV volume peaked at around 1 billion, but its UV volume was far less than ten times that of Taobao. I believe everyone will know the reason for this.

Because the number of people visiting www.taobao.com at the same time is too huge, even the server that generates the Taobao home page cannot be only one. There may be hundreds or even thousands of servers used only to generate the home page of www.taobao.com, so the task of generating a page for you during your visit will be assigned to one of the servers. This process must be fair, equitable, and even (the number of users each of these hundreds or thousands of servers must be about the same). This very complex process is completed by several systems, the most critical of which is LVS ( Linux Virtual Server), one of the most popular load balancing systems in the world, was developed by Dr. Zhang Wensong who currently works at Taobao.

After a series of complex logical operations and data processing, the HTML content for the Taobao homepage shown to you this time has been successfully generated. Anyone who has a little knowledge about the web front-end should know that in the next step, the browser will load the css, js, images, scripts and resource files used in the page. However, relatively few students may know that there is a limit to the number of resources that your browser can load concurrently under the same domain name. For example, IE6-7 has two resources, IE8 has six resources, and each version of Chrome is different. Usually 4-6. I just took a look. When I visit Taobao's homepage, I need to load 126 resources. So such a small number of concurrent connections will naturally take a long time to load. Therefore, front-end developers often distribute the above resource files under multiple domain names to bypass this browser restriction in disguise and prepare for the CDN work below.

According to unreliable news, at the peak of Double Eleven, Taobao’s access traffic peaked at 871GB/S. This number means that 1.78 million 4Mb bandwidth home broadband is needed to make it affordable, and it is fully capable of overwhelming the entire Internet bandwidth of a small and medium-sized city. So obviously, these access traffic cannot be concentrated together. And everyone knows that mutual access between different networks (Telecom, China Unicom, etc.) in different regions will be very slow, but you rarely find that access to Taobao is slow. This is the role of CDN (Content Delivery Network), the content distribution network. Taobao has established dozens or hundreds of CDN nodes across the country, and uses some means to ensure that the places you visit (here mainly refers to js, ​​css, pictures, etc.) are the CDN nodes closest to you, thus ensuring that large traffic is dispersed everywhere. Access the acceleration node.

This raises a problem, that is, if a seller releases a new baby and uploads several new baby pictures, how can Taobao ensure that this will be synchronized in CDN nodes across the country? How many pictures are available for users to use? This involves a lot of content distribution and synchronization related technologies. Taobao developed Distributed File SystemTFS (Taobao File System) to deal with such problems.

Okay, now you have finally loaded the Taobao homepage, then you habitually enter the word 'sweater' in the search box on the homepage and hit enter. At this time, you generate another PV, and then , Taobao’s main search system will begin to serve you. It first performs word segmentation operation on the content you input based on a word segmentation library. As we all know, English is based on words, and words are separated by spaces, while Chinese is based on words, and all the words in a sentence can be connected to describe a meaning. For example, the English sentence I am a student would be: "I am a student" in Chinese. The computer can easily know that student is a word through spaces, but it cannot easily understand that the words "learn" and "生" combined represent one word. Splitting the Chinese character sequence into meaningful words is Chinese word segmentation. Some people also call it word segmentation . I am a student, and the result of the participle is: I am a student.

After word segmentation, you also need to analyze your shopping intentions based on the search terms you entered. Users often have the following types of intentions when searching: (1) Browsing type: There is no clear shopping object and intention. Users are more casual and emotional when buying while looking. Query, for example: "Ranking of the top 10 perfumes in 2010", "Popular sweaters in 2010", "How many types of zippo are there?"; (2) Query type: There is a certain shopping intention, which is reflected in the requirements for attributes. Query, for example: "Mobile phone suitable for the elderly", "Watch for 500 yuan"; (3) Comparative type: The shopping intention has been narrowed down to certain products. Query, for example: "Nokia E71 E63", "akg k450 px200"; (4) Determined type: a basic decision has been made and a certain object will be focused on. Query example: "Nokia N97", "IBM T60". By analyzing your shopping intent, the main search will show completely different results.

After a few steps, the main search system lists the search results based on the above and more complex conditions, all of which are completed by more than a thousand search servers. Then you start clicking to browse the searched treasures one by one. You start to view the baby details page. Those who frequently shop online will find that after you buy a product, even if the merchant modifies the product details page many times, you can still view the snapshot at that time through the 'Purchased Products'. This is to prevent merchants from reneging on what they promised in the product details. Obviously, it is not a simple matter to save and quickly recall product details snapshots of tens of billions of transactions every year. This also involves the cooperation of several systems, the more important of which is Tair, a distributed KV storage solution independently developed by Taobao.

Then regardless of whether you actually conduct a transaction, your access behaviors will be faithfully recorded by the system for subsequent business logic and data analysis. Among these records, access log record is one of the most important records. However, we learned earlier that these accesses are distributed on many different servers in various regions, and due to the large number of users, these log records are It is very large, reaching the TB level is very normal. In order to transmit and synchronize these log data quickly and timely, Taobao developed TimeTunnel, which is used for real-time data transmission and handed over to the back-end system for calculation of reports and other operations.

Your browsing data, transaction data and many other data records will be retained. As a result, the historical data stored on Taobao can easily reach ten or more PB (1PB=1024TB=1048576GB). Such a huge amount of data is stored in Taobao's data warehouse through extreme compression of 1:120 by the Taobao system. And through a very large-scale data system called Yunlai, which consists of more than 2,000 servers, it is continuously analyzed and mined.

From this data, Taobao can know who you are, what you like, how old your child is, whether you are in a relationship, what kind of drinks do people who like to play World of Warcraft like, etc. There is a huge amount of information on the retail situation of various industries, the rise and fall of various commodities, and so on.

Having said so much, I have only described a few of the thousands of systems running on Taobao. Even if you only visit the homepage of Taobao once, the technology and system scale involved are completely unimaginable. They are the brainchild of more than 2,000 top Taobao engineers, including Yangtze River Scholars and the National Science and Technology Supreme Award. Winners and many other great names. Similarly, the business systems of Baidu, Tencent, etc. are by no means simpler than Taobao. What you need to know is that the Internet products you use every day may seem simple and easy to use, but behind them there is unimaginable wisdom and labor.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn