Home >Web Front-end >HTML Tutorial >10 issues that must be considered when designing and producing large-scale website architecture_HTML/Xhtml_Web page production

10 issues that must be considered when designing and producing large-scale website architecture_HTML/Xhtml_Web page production

WBOY
WBOYOriginal
2016-05-16 16:37:131294browse

We are not talking about PHP, JSP or .NET environment here. We look at the problem from the perspective of architecture. The implementation language is not a problem. The advantage of the language lies in the implementation rather than the quality. No matter which language you choose, the architecture is It must be faced.

 1. Processing of massive data

As we all know, for some relatively small sites, the amount of data is not very large. Select and update can solve the problems we face. The load itself is not very large, and it can be solved by adding a few indexes at most. For large websites, the amount of data every day may be in the millions. If a many-to-many relationship is poorly designed, there will be no problems in the early stage. However, as the number of users increases, the amount of data will increase geometrically. At this time, the cost of selecting and updating a table (not to mention joint query of multiple tables) is very high.

 2. Data concurrency processing

In some cases, 2.0 CTOs have a Shang Fang sword, which is caching. Caching is also a big problem when there is high concurrency and high processing. The cache is globally shared throughout the application. However, when we make modifications, if two or more requests request updates to the cache at the same time, the application will die directly. At this time, a good data concurrency processing strategy and caching strategy are needed.

In addition, there is the problem of deadlock in the database. We may not feel it at ordinary times. The probability of deadlock in high concurrency situations is very high. Disk caching is a big problem.

 3. Problems of file storage

For some 2.0 sites that support file upload, when we are fortunate that hard disk capacity is getting larger and larger, we should consider more about how files should be stored and effectively indexed. A common solution is to store files by date and type. But when the file volume is massive data, if a hard disk stores 500 G trivial files, then the Io of the disk will be a huge problem during maintenance and use. Even if your bandwidth is sufficient, but you The disk may not respond. If uploading is also involved at this time, the disk will easily become over.

Maybe using RAID and dedicated storage servers can solve the current problem, but there is still a problem of access from various places. Maybe our server is in Beijing, Yunnan or Xinzang. How to solve the access speed? If we do distribution formula, then how should we plan our file index and architecture.

So we have to admit that file storage is a very difficult problem

 4. Processing of data relationships

We can easily plan a database that conforms to the third paradigm, which is full of many-to-many relationships, and can also use GUID to replace INDENTIFY COLUMN. However, in the 2.0 era where many-to-many relationships are abundant, the third paradigm is The first one should be discarded. Multi-table joint queries must be effectively reduced to a minimum.

5. Data index problem

As we all know, indexing is the cheapest and easiest way to improve database query efficiency. However, in the case of high UPDATE, the cost of update and delete will be unimaginably high. The author encountered a situation where updating a focused index took 10 minutes to complete. So for the site, these basic It's unbearable.

Indexing and updating are natural enemies. Issues A, D, and E are issues that we have to consider when doing architecture, and they may also be the issues that take the most time.

 6. Distributed processing

For 2.0 websites due to their high interactivity, the effect of CDN is basically 0. The content is updated in real time and we handle it conventionally. In order to ensure the access speed in various places, we need to face a huge problem, which is how to effectively realize data synchronization and update. Real-time communication of servers in various places is an issue that must be considered.

7. Ajax pros and cons analysis

AJAX succeeds, and AJAX fails. AJAX has become the mainstream trend, and suddenly I found that post and get based on XMLHTTP are so easy. The client gets or posts data to the server, and the server returns it after receiving the data request. This is a normal AJAX request. But during AJAX processing, if we use a packet capture tool, the data return and processing will be clear at a glance. For some computationally intensive AJAX requests, we can construct a packet sending machine, which can easily kill a webserver.

8. Analysis of data security

For the HTTP protocol, data packets are transmitted in clear text. Maybe we can say that we can use encryption, but for the G problem, the encryption process may be in clear text (such as the QQ we know, You can easily judge its encryption and effectively write an encryption and decryption method similar to his). When your site traffic is not very large, no one will care about you, but when your traffic increases, so-called plug-ins and so-called mass messages will follow one after another (you can see the clues from the mass messages at the beginning of QQ). Perhaps we can safely say that we can use higher-level judgment or even HTTPS to implement it. Note that when you do these processes, you will pay massive database, IO and CPU costs. For some mass sending, it is basically impossible. The author has been able to achieve mass messaging for Baidu space and qq space. If you are willing to give it a try, it is actually not difficult.

 9. Data synchronization and cluster processing issues

When one of our database servers is overwhelmed, we need to do database-based load and clustering at this time. This may be the most troublesome problem at this time. Data is transmitted over the network. Depending on the design of the database, data delay is a terrible problem and an unavoidable problem. In this case, we need to use other means to solve the problem. Ensure that effective interaction is achieved within this delay of a few seconds or longer. Such as data hashing, segmentation, content processing and other issues.

 10. Data sharing channels and OPENAPI trends

Openapi has become an inevitable trend. From google, facebook, myspace to domestic schools, everyone is considering this issue. It can retain users more effectively, stimulate more interest in users, and attract more people. Help you do the most effective development. At this time, an effective data sharing platform and data open platform have become indispensable. Ensuring data security and performance in the case of open interfaces is another issue that we must seriously consider.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn