Home  >  Article  >  Web Front-end  >  How to build a highly scalable website?

How to build a highly scalable website?

伊谢尔伦
伊谢尔伦Original
2016-11-29 09:11:291578browse

This article summarizes the following content by reading "50 Principles of Highly Scalable Websites".

On the one hand, the blogger has no actual architectural experience, and on the other hand, his knowledge is not broad enough, so he can only systematically summarize the key points in the book and make some conclusions based on his own understanding.

Main content

This book puts forward 50 suggestions around high scalability from multiple aspects. A highly scalable website will freely expand its architecture as the business develops and users increase, thereby easily coping with the changes in the website. Rapid development. Let’s take a look at the specific content of this book:

How to build a highly scalable website?

Simplifying equations

 1 Don’t over-design

 Excessive design is equivalent to increasing the complexity and maintenance cost of the system. However, these excessive designs have little effect in normal use. It is often a function that the designer thinks is important or the icing on the cake, but is of little actual use.

 2 Consider scalability when designing

  The following design principles should be followed when designing: consider 20 times the capacity when designing, consider 3 times the capacity when implementing, and consider 1.5 times the capacity when deploying. Difficulties caused by temporary expansion when the project is expanded.

 3 Simplify the plan

Should follow the Pareto Principle. 20% of the design does 80% of the work, so 80% of the time should be spent on these 20% of the design.

 The main functions of a product are actually concentrated on a few points. Once these points are designed, the rest are just additional functions. Therefore, this core business must be simple and easy to use.

 4 Reduce DNS queries

  Every file in a different domain needs to query DNS when loading. For example, cnblogs.com and i.cnblogs.com belong to different domains. Then when querying DNS, it will be queried twice. When the business volume is large, it will have a certain impact.

 5 Reduce objects as much as possible

 Because objects need to be loaded when the browser accesses them. Therefore, you can consider reducing the number of requested files (the number is related to the number of concurrent browser loads) and merging some objects as much as possible. For example, icon files can be merged into one large picture. A reasonable number of files will speed up browser access and loading.

 6 Use the same brand of network equipment

 Because one http request may pass through many physical devices. Such as load balancers, switches, routers. So try to use the same brand of equipment to avoid some unexpected situations.

Distributed work

How to build a highly scalable website?

7 Common ones include clustering, load balancing, etc., and separation of read and write in the database.

 8 Y-axis, splitting different things

 In large systems, splitting different functions, such as registration, purchase, query, cloud disk. Wait

 9 Z axis, split different similar things

 For example, split according to the user's level, or the user's geographical location, etc.

Horizontal expansion design

 10 Design a horizontal expansion plan

Expansion includes horizontal and vertical expansion. Horizontally, we copy and clone applications and use minicomputer clusters to expand. The vertical aspect is to improve the server hardware and network facilities.

It can be found through many cases that vertical expansion achieved by simply upgrading hardware can only solve a little bit of practical pressure. However, through horizontal cluster expansion, scaling can be achieved freely.

 11 Use an economical system

  Similar to the above principles, using a high-priced server does not guarantee good performance in the future. Ordinary minicomputer cluster extensions should be used.

 12 Scale-out Data Center

There are many design solutions for data centers, such as

Hot and cold station configuration: use the hot station to provide services. When the hot station collapses, use the cold station to continue services.

How to build a highly scalable website?

It is recommended to use multiple real-time sites for lower cost and dynamic calling. The disadvantage is that it increases the difficulty of operation and maintenance.

 13 Use cloud technology for design

 The advantage of cloud computing is virtualization, which can flexibly expand equipment during business peaks. And for daily processing, return the extension.

 The disadvantage is that it increases the coupling applied to the virtual environment. It is mentioned later that using physical devices to isolate services may cause certain interference in troubleshooting business isolation errors in virtualized cloud computing.

Use the right tools

14 Use the database properly

There are many database versions, such as the traditional relational database Oracle and MySQl, as well as the newer non-relational database NoSql, such as MongoDB, and the in-memory database FastDB. There are also Aerospike specifically for SSD solid state drives and so on.

But when it comes to selection, you still have to decide based on your personal business needs. It depends on whether your database requires speed, security, etc.

 15 Firewalls, firewalls are everywhere

 Firewalls can intercept and filter some invalid access. Usually some CSS, static files, pictures, JS, etc. are not used in the firewall, but are used when critical business involves personal information. Properly designed firewalls will also have a certain impact on website performance.

 16 Actively use log files

 Use various logs and tools to monitor the business in real time. Not only monitor the server's memory and CPU, but also monitor business data. For example, splunk (provides log collection, storage, search, and graphical display).

 Don’t do repetitive work

17 Don’t immediately check the work you just did

 For example, if you just wrote data, don’t read it immediately. Although some customers need to ensure that the data is complete and cannot be lost. However, it can be recorded through logs, etc., and this method of checking after writing is not recommended.

 18 Stop redirection

  Redirection will consume a certain amount of delay and computing resources. It should be avoided as much as possible

 19 Relaxing timing constraints

Most relational databases pay attention to ACID properties, which will cause certain problems when expanding. Therefore, appropriate relaxation of timing constraints for certain businesses can improve website performance.

For example, when booking a hotel on a certain website, the user will wait for the hotel’s review after making the reservation. For example, when withdrawing money on a certain account, the time range is confirmed. This is to expand the timing constraints, thereby improving website performance and transaction security.

Actively utilize caching

  20 Utilize CDN

You can use CDN to save customer data and content. The general process is that when a user accesses a website, he or she goes to the CDN server, and the CDN performs a DNS query and allocates user requests to different servers. There are many CDN service providers that provide this service.

  21 Use expiration header

  Use expiration headers for different object types to reduce object requests. Common HTTP corresponding attributes are: public no-cahe max-age, etc.

 22 Caching Ajax calls

  Correctly modify the Http header Last-Modified Cache-Control Expires and other attributes.

 23 Use page caching

 Cache to respond to previous winter requests and reduce the load on the web server.

 24 Utilize application cache

 For example, cache request data for certain special users.

 25 Utilizing object cache

  Suitable for repeated query of data objects used. For example, a shopping website caches hot-selling product data.

 26 Put the object cache on its own layer

  Use a separate cache layer for easy expansion and maintenance.

Learn from your mistakes

 27 Active learning

  Only when a company has an atmosphere of learning can it produce better products. The content of learning includes customers' business knowledge on the one hand, and comes from the technical and operation and maintenance fields on the other hand.

 28 Don’t rely on QA to find errors

  The biggest purpose of hiring testers or quality assurance personnel is to detect the correctness of the product. It can reduce costs and increase the development speed of developers, because developers do not need to pay attention to the correctness of the code at all times and can leave it to QA for testing.

  But QA is only responsible for finding problems. How to avoid problems still depends on developers.

 29 A design without rollback is a failed design

 The rollback here refers to the rollback of product release. If you encounter bugs in certain versions, you may need to deliver a previously runnable version. If there is no rollback at this time, you will not be able to deliver the product.

 It is recommended to learn related content about continuous integration.

 30 Discuss failures and learn from them

 You should not fail twice on the same problem. It is indispensable to summarize each failure.

Database principles

ACID properties of relational databases:

Atomicity: A transaction is either fully executed or not executed at all,

Consistency: All data states must be consistent at the beginning and end of a transaction,

Isolation : The performance of the transaction is the only operation of the transaction on the database.

  Durability: The transaction is completed and the operation cannot be changed.

 31 Pay attention to the costly relationship

 The structure of the design table should be improved during the design phase. When development begins, adding certain columns may be very costly.

 32 Use the correct database lock

 There are many lock concepts in the database, such as implicit locks, explicit locks, row locks, page locks, range locks, table locks, database locks, etc.

 Irrational use of locks will affect the throughput of the website.

 33 Do not use multi-stage submission

 For example, two-stage submission: vote first, then submit. This reduces scalability because no other operations can be performed until the commit transaction is completed.

 34 Do not use select for update

 Because the FOR UPDATE clause will cause rows to be locked and reduce the speed of transaction processing.

 35 Don’t select all the data

 For example, select * from xxx;

 The first reason for this approach is that it does not allow for data expansion. For example, there are originally four columns of data, and the business processing code is written directly. When a column of data is added, an error will occur; in addition, unnecessary data will be queried.

 Or inset into xxx values(xxxx);

 This is when the column information does not match, an error will also occur.

 Fault-tolerant design and fault control

 36 Use "swim lanes" to isolate faults

  There are many ways to divide services and data, such as containers, clusters, pools, shards, and swim lanes. Swim lanes mean that each business has its own domain and cannot be called across swim lanes.

  37 Don’t trust single point of failure

  There are many systems designed in single point mode. When the entire system only uses this module, when a single point of failure occurs, the entire system will collapse.

 38 Avoid system series connection

 For example, a system consists of many components, and each component has 99.9% security. When 3 components are connected in series, the availability of the entire system becomes 99.7%.

 39 Ensure that functions can be enabled/disabled

 For some shared libraries and third-party services, the function should be enabled or disabled.

 Avoid or distribute state

  40 Strive to achieve statelessness

  Implementing state will limit scalability and increase costs

  41 Maintain sessions on the browser side as much as possible

  On the one hand, reduce server pressure, on the other hand, any request Can be sent to any server.

 42 Use distributed cache to store status

  Use an independent cache layer to facilitate expansion. There are many distributed caching solutions, such as memcached.

Asynchronous communication and message bus

 43 Use asynchronous communication as much as possible

 Asynchronous communication can ensure the independence between each service and layer, which makes it easier to improve the scalability of the system and reduce coupling.

 44 Ensure that the message bus can be expanded

  Try to use Y-axis or Z-axis expansion, that is, expand according to business needs and functions. Because simple copying or cloning will increase the number of listeners for each message subscriber. According to business isolation, business pressure can be separated.

 45 Avoid overcrowding the message bus

 Weigh the value against the cost of the message.

How to build a highly scalable website?

Other Principles

 46 Use third-party solution extensions with caution

 If an enterprise has a problem, then it should look for a third party to solve the urgent problem. But it is not a long-term solution, because the solution provider has many customers, and your crisis is not their crisis, so it is impossible to perform its duties at the critical moment. Therefore, companies should still have a certain degree of control (this word is really tall!).

 47 Clearing, archiving and cost-effective storage

  If there is some unnecessary data, it should be deleted regularly. Some data of slight value are archived regularly and deleted directly. Some valuable data should be backed up and accessed quickly.

 48 Delete business intelligence in transaction processing

 The product system should be separated from the business system to improve the scalability of the product.

  Avoid being restricted by system architecture when business expansion occurs.

  49 Design applications that can be monitored

 You should design a global monitoring strategy to ensure answers

  "Did a problem occur?"

  "Where did the problem occur?"

  "What problem occurred?"

  Will a problem occur? "

 "Can it be repaired automatically? "

How to build a highly scalable website?

50 To be competent

  The best architecture should be involved in each design and not rely entirely on third-party solutions.

  A simple and excellent architecture is small and refined. If you solely rely on open source to solve the architecture, although the problem is solved, it will lead to bloated applications.


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn