Home >Common Problem >10-minute quick solution | Large-scale distributed e-commerce system architecture
#This article is a technical summary of learning large-scale distributed website architecture. Provides a brief description of the architecture of a high-performance, high-availability, scalable and extensible distributed website, and gives an architectural reference. Part of the article is reading notes, and part is a summary of personal experience, which has good reference value for large-scale distributed website architecture.
is user-centered and provides a fast web access experience. The main parameters are short response time, large concurrent processing capability, high throughput and stable performance parameters.
It can be divided into front-end optimization, application layer optimization, code layer optimization and storage layer optimization.
Large websites should be accessible at all times and provide normal external services. . Because of the complexity, distribution, cheap servers, open source databases, operating systems and other characteristics of large websites, it is difficult to ensure high availability, which means website failures are inevitable.
How to improve availability is a problem that needs to be solved urgently. First of all, we need to consider it from the architectural level, and consider availability when planning. In the industry, several nines are generally used to represent availability indicators, such as four nines (99.99), and the allowed unavailability time in a year is 53 minutes.
Different strategies are used at different levels. Redundant backup and failover are generally used to solve high availability problems.
Scalability refers to increasing/reducing the system's processing capabilities by adding/reducing hardware (servers) without changing the original architecture design.
can easily add/remove functional modules and provide code /Good extensibility at module level.
Have effective solutions to known problems and establish unknown/potential problems Discovery and defense mechanisms. For security issues, we must first improve security awareness and establish an effective security mechanism to ensure it from the policy level and organizational level. For example, server passwords cannot be leaked, passwords are updated monthly, and cannot be repeated within three times; weekly security scans, etc. Strengthen the construction of the safety system in an institutionalized manner. At the same time, attention needs to be paid to all aspects related to safety. Security issues cannot be ignored, including infrastructure security, application system security, data confidentiality and security, etc.
Commonly used encryption and decryption algorithms (single hash encryption [MD5, SHA], symmetric encryption [DES, 3DES, RC]) , asymmetric encryption [RSA], etc.
The architectural design and operation and maintenance management of the website must adapt to changes and provide high scalability and scalability. Conveniently cope with rapid business development, sudden increase in high-traffic access and other requirements.
In addition to the architectural elements introduced above, it is also necessary to introduce the ideas of agile management and agile development. Unify business, products, technology, and operation and maintenance, adapt to needs, and respond quickly.
The above uses a seven-layer logical architecture, the first layer is the customer layer , the second layer of front-end optimization layer, the third layer of application layer, the fourth layer of service layer, the fifth layer of data storage layer, the sixth layer of big data storage layer, and the seventh layer of big data processing layer.
A mature large-scale website ( The system architecture of Taobao, Tmall, Tencent, etc.) is not designed with complete features such as high performance, high availability, and high scalability from the beginning. It gradually evolves and improves as the number of users increases and business functions expand. , in this process, the development model, technical architecture, and design ideas have also undergone great changes. Even the technical staff has developed from a few people to a department or even a product line.
So the mature system architecture is gradually improved with the expansion of the business, and is not achieved overnight; systems with different business characteristics will have their own focuses, such as Taobao, which needs to solve the search for massive product information. , place orders, and pay; for example, Tencent needs to handle real-time message transmission for hundreds of millions of users; Baidu needs to handle massive search requests.
They all have their own business characteristics and the system architecture is also different. Despite this, we can also find common technologies from these different website backgrounds. These technologies and methods are widely used in the architecture of large-scale website systems. Let’s understand these technologies and methods by introducing the evolution process of large-scale website systems. means.
In the initial architecture, applications, databases, and files are all deployed on one server, as shown in the figure:
With the expansion of business, one server can no longer satisfy Performance requirements, so applications, databases, and files are deployed on separate servers, and different hardware is configured according to the server's purpose to achieve the best performance results.
While optimizing performance through hardware, we also Software performs performance optimization. In most website systems, caching technology is used to improve system performance. The use of caching is mainly due to the existence of hot data. Most website visits follow the 28 principle (that is, 80% of access requests are eventually fulfilled). on 20% of the data), so we can cache hotspot data, reduce the access paths of these data, and improve user experience.
#The common ways to implement cache are local cache and distributed cache. Of course, there are also CDNs, reverse proxies, etc., which will be discussed later. Local cache, as the name suggests, caches data locally on the application server. It can be stored in memory or in files. OSCache is a commonly used local cache component. The characteristic of local cache is that it is fast, but because the local space is limited, the amount of cached data is also limited. The characteristic of distributed cache is that it can cache massive amounts of data and is very easy to expand. It is often used in portal websites and is not as fast as local cache. Commonly used distributed caches are Memcached and Redis.
The application server, as the entrance to the website, will bear a large number of requests. We often use the application server to cluster to share the number of requests. A load balancing server is deployed in front of the application server to schedule user requests and distribute the requests to multiple application server nodes according to the distribution policy.
Commonly used load balancing technology hardware includes F5, which is relatively expensive, and software includes LVS, Nginx, and HAProxy. LVS is a four-layer load balancing, which selects internal servers based on the target address and port. Nginx and HAProxy are seven-layer load balancing, which can select internal servers based on message content. Therefore, the LVS distribution path is better than Nginx and HAProxy, and the performance is higher. Nginx and HAProxy are more configurable, and can be used for dynamic and static separation (choose a static resource server or an application server based on the characteristics of the request message).
With the increase in the number of users, the database has become the biggest bottleneck. Commonly used methods are used to improve database performance. The method is to separate read and write and sub-database and table. As the name suggests, read-write separation is to divide the database into a read database and a write database, and achieve data synchronization through the main and backup functions. Database sharding and table sharding are divided into horizontal sharding and vertical sharding. Horizontal sharding is to split a very large table in a database, such as a user table. Vertical segmentation is based on different businesses. For example, tables related to user business and product business are placed in different databases.
If our servers are deployed In the computer room in Chengdu, access is faster for users in Sichuan, but slower for users in Beijing. This is because Sichuan and Beijing belong to different developed regions of China Telecom and China Unicom respectively, and users in Beijing need to access the Internet through the Internet. The router takes a long path to access the server in Chengdu, and the return path is the same, so the data transmission time is relatively long. For this situation, CDN is often used to solve the problem. CDN caches the data content to the operator's computer room. When users access the data, they first obtain the data from the nearest operator, which greatly reduces the network access path. More professional CDN operators include Lanxun and Wangsu.
The reverse proxy is deployed in the computer room of the website. When the user request arrives, the reverse proxy server is first accessed. The reverse proxy server returns the cached data to the user. If there is no cached data, it will continue. Access the application server to obtain, which reduces the cost of obtaining data. Reverse proxies include Squid and Nginx.
The number of users is increasing day by day, and the business volume is increasing Large, more and more files are generated, and a single file server can no longer meet the demand. At this time, the support of a distributed file system is needed. Commonly used distributed file systems include GFS, HDFS, and TFS.
For query and analysis of massive data, we use NoSQL databases plus search engines can achieve better performance. Not all data needs to be placed in relational data. Commonly used NoSQL include MongoDB, HBase, and Redis, and search engines include Lucene, Solr, and Elasticsearch.
As the business further expands, the application becomes very Bloated, then we need to split the application into services, such as Baidu into news, web pages, pictures and other services. Each business application is responsible for relatively independent business operations. Businesses communicate through messages or share databases.
At this time we found that each business application will use some Basic business services, such as user services, order services, payment services, and security services, are the basic elements that support various business applications. We extract these services and use the distributed service framework to build distributed services. Ali’s Dubbo is a good choice.
Distributed large-scale websites currently have several main categories:
Large portals are generally news information, which can be optimized using CDN, static and other methods. Kaixin.com and other websites are more interactive and may introduce more NoSQL and distributed cache. Use high-performance communication frameworks, etc. E-commerce websites have the characteristics of the above two categories. For example, product details can use CDN and are static. Those with high interactivity need to use NoSQL and other technologies. Therefore, we use e-commerce websites as a case for analysis.
Customer needs:
Customers are customers. They will not tell you what they want specifically, they will only tell you what they want. We often need to guide and explore customer needs. Fortunately, a clear reference website is provided. Therefore, the next step is to conduct a lot of analysis, combine the industry, and reference websites to provide customers with solutions. Requirements Function MatrixThe traditional approach to requirements management is to use use case diagrams or module diagrams (requirements lists) to describe requirements. Doing so often overlooks a very important requirement (non-functional requirement), so it is recommended that you use the requirements function matrix to describe requirements. The demand matrix of this e-commerce website is as follows:
However, the current mainstream website architecture has undergone earth-shaking changes. Generally, a cluster approach is used for high-availability design. At least it looks like this:
Do you regret not studying mathematics well? ! (I don’t know if the above calculation is wrong, haha~~) Server estimate: (take tomcat server as an example) According to a web server, it supports 300 concurrent calculations per second. Normally 10 servers are needed (approximately equal to); [tomcat default configuration is 150], 30 servers are needed during peak periods; capacity estimate: 70/90 principle The system CPU is generally maintained at about 70%, and reaches 90% during peak periods. This does not waste resources and is relatively stable. Memory and IO are similar. The above estimates are for reference only, because server configuration, business logic complexity, etc. all have an impact. Here the CPU, hard disk, network, etc. are no longer evaluated. 5. Website architecture analysis Based on the above estimation, there are several problems:
Large websites generally need to do the following architecture optimization (optimization is something that needs to be considered when designing the architecture. It is usually solved from the architecture/code level. Tuning is mainly the adjustment of simple parameters, such as JVM tuning; if tuning involves a lot of code modification, it is not tuning, but refactoring):
Based on business attributes Vertical segmentation is divided into product subsystem, shopping subsystem, payment subsystem, review subsystem, customer service subsystem, and interface subsystem (interconnecting with external systems such as purchase, sale, inventory, SMS, etc.). Based on the level definition of business subsystems, they can be divided into core systems and non-core systems. Core system: product subsystem, shopping subsystem, payment subsystem; non-core: review subsystem, customer service subsystem, interface subsystem.
Architecture diagram after splitting:
Reference deployment plan 2
As shown above, each application is deployed separately, and the core system and non-core system are deployed in combination
Architecture diagram after cluster deployment:
Combined with Cache middleware, the distributed Session implemented can simulate the Session session very well.
Large websites need to store massive amounts of data. In order to achieve massive data storage , High availability and high performance generally adopt a redundant approach to system design. Generally, there are two ways to separate reading and writing and sub-database and table. Read and write separation: Generally, to solve the scenario where the read ratio is much greater than the write ratio, one primary and one standby, one primary and multiple standbys, or multiple primary and multiple standbys can be used. This case is based on business splitting, combined with database sharding, table sharding and read-write separation. As shown below:
Related middleware can refer to Cobar (Alibaba, currently no longer maintained), TDDL (Alibaba), Atlas (Qihoo 360), MyCat. Problems with sequences, JOIN, and transactions after sharding databases and tables will be introduced in the theme sharing of sharding databases and tables.
Extract functions/modules common to multiple subsystems and use them as public services. For example, the membership subsystem in this case can be extracted as a public service.
Message queue can solve subsystems/modules The coupling between them achieves an asynchronous, highly available, and high-performance system. It is the standard configuration of distributed systems. In this case, message queue is mainly used in shopping and delivery links.
Currently, the most commonly used MQs include Active MQ, Rabbit MQ, Zero MQ, MS MQ, etc. You need to choose according to the specific business scenario. It is recommended to study Rabbit MQ.
In addition to the business split introduced above, application clusters, multi-level cache, single sign-on, database Clustering, servitization, and message queues. There are also CDN, reverse proxy, distributed file system, big data processing and other systems. I won’t introduce it in detail here. You can ask Du Niang/Google. If you have the opportunity, you can also share it with everyone.
The above is the detailed content of 10-minute quick solution | Large-scale distributed e-commerce system architecture. For more information, please follow other related articles on the PHP Chinese website!