Home >headlines >How does PHP solve the big data, large traffic and high concurrency of the website?

How does PHP solve the big data, large traffic and high concurrency of the website?

无忌哥哥
无忌哥哥Original
2018-06-27 14:44:161896browse

1: Hardware aspect

An ordinary p4 server can support up to about 100,000 IPs per day. If the number of visits exceeds 10W, a dedicated server is needed to solve it. If the hardware is not powerful, how can the software be optimized? It's all useless. The main factors that affect the speed of the server are: network - hard disk read and write speed - memory size - cpu processing speed.

2: Software aspect

The first thing I want to talk about is the database. First of all, it must have a good structure. Try not to use * in queries. Avoid related subqueries. Add indexes and sort for frequently queried items. To replace non-sequential access, if conditions permit, it is generally best to install MySQL server in the Linux operating system. Regarding apache and nginx, it is recommended to use nginx in high concurrency situations. Ginx is a good alternative to the Apache server. nginx consumes less memory. The official test can support 50,000 concurrent connections, and in the actual production environment, the number of concurrent connections can reach 20,000 to 30,000. Close unnecessary modules in PHP as much as possible and use memcached. Memcached is a high-performance distributed memory object caching system that directly transfers data from memory without using a database. This greatly improves the speed. iiS or Apache enables GZIP compression to optimize the website and compress the website content to greatly save website traffic.

Second, prohibit external hotlinking.

Hotlinking of pictures or files from external websites often brings a lot of load pressure, so external links to

hotlinking of own pictures or files should be strictly restricted. Fortunately, currently you can simply refer to it. To control hot links, Apache itself can disable hot links through configuration. IIS also has some third-party ISAPIs that can achieve the same function. Of course, forging referrals can also be done through code to achieve hotlinking. However, currently there are not many people who deliberately forge referrals to hotlink.

You can ignore it or use non-technical means to solve it, such as on pictures. Add watermark.

Third, control the download of large files.

Downloading large files will take up a lot of traffic, and for non-SCSI hard drives, downloading a large number of files will consume

CPU, which will reduce the website's responsiveness. Therefore, try not to provide downloads of large files exceeding 2M. If

is required, it is recommended to place the large files on another server.

Fourth, use different hosts to divert the main traffic

Place files on different hosts and provide different images for users to download. For example, if you feel that RSS files take up a lot of

traffic, then use services such as FeedBurner or FeedSky to place the RSS output on other hosts. In this way, most of the traffic pressure of other people's access will be concentrated on FeedBurner's host, and RSS will not be available. Taking up too many resources

Fifth, use different hosts to divert the main traffic

Place files on different hosts and provide different images for users to download. For example, if you feel that RSS files take up a lot of traffic, then use services such as FeedBurner or FeedSky to place the RSS output on other hosts. In this way, most of the traffic pressure of other people's access will be concentrated on FeedBurner's host, and RSS will not occupy too many resources.

Sixth, use traffic analysis and statistics software.

Installing a traffic analysis and statistics software on the website can instantly know where a lot of traffic is consumed and which pages need to be optimized. Therefore, accurate statistical analysis is required to solve the traffic problem. For example: Google Analytics.

Constraints for high concurrency and high load: hardware, deployment, operating system, Web server, PHP, MySQL, testing


Deployment: server separation, database cluster and library table hashing, mirroring , Load balancing

Load balancing classification: 1), DNS round robin 2) Proxy server load balancing 3) Address translation gateway load balancing 4) NAT load balancing 5) Reverse proxy load balancing 6) Hybrid load balancing


Deployment plan 1:

Scope of application: websites and application systems with static content as the main body; websites and application systems with high system security requirements.

Main Server: Main server

Carries the main running pressure of the program and handles dynamic requests in the website or application system;

Push static pages to multiple publishing servers;

Push the attachment file to the file server;

For websites with high security requirements and mainly static, the server can be placed on the intranet to block access from the external network.

DB Server: Database server

carries the database read and write pressure;

only exchanges data volume with the main server and blocks external network access.

File/Video Server: File/Video Server

Hosts data streams that occupy large system resources and bandwidth resources in the system;

serves as storage, reading and writing of large attachments Warehouse;

As a video server, it will have automatic video processing capabilities.

Publishing server group:

is only responsible for publishing static pages and carries the vast majority of web requests;

performs load balancing deployment through Nginx.

Deployment plan 2:

Scope of application: websites or application systems with dynamic interactive content as the main body; websites or application systems with heavy load pressure and sufficient budget;

Web server group:

The Web service has no master-slave relationship and is a parallel redundant design;

Load balancing is achieved through the front-end load balancing device or Nginx reverse proxy;

Divide dedicated file servers/video servers to effectively separate light/heavy buses;

Each Web server can connect to all databases through DEC and divide it into masters and slaves.

Database server group:

Bears relatively balanced database read and write pressure;

realizes data synchronization of multiple databases through the mapping of database physical files.

Shared disk/disk array

Will be used for unified reading and writing of data physical files

Storage warehouse for large attachments

Through its own physical disk Balance and redundancy to ensure the IO efficiency and data security of the overall system;

Features of the solution:

Reasonably distribute Web pressure through front-end load balancing;

Through file/ The video server is separated from the conventional Web server to reasonably distribute the light and heavy data streams;

Through the database server group, the database IO pressure is reasonably distributed;

Each Web server usually only connects to one database server. Through DEC's heartbeat detection, it can automatically switch to a redundant database server in a very short time; the introduction of

disk arrays not only greatly improves the system IO efficiency, but also greatly enhances data security.

Web server:

A large part of the resource usage of the Web server comes from processing Web requests. Under normal circumstances, this is the pressure generated by Apache. In the case of high concurrent connections, Nginx is Apache A good alternative to servers. Nginx ("engine x") is a high-performance HTTP and reverse proxy server written in Russia. In China, many websites and channels such as Sina, Sohu Pass, NetEase News, NetEase Blog, Kingsoft Xiaoyao.com, Kingsoft iPowerWord, Xiaonei.com, YUPOO Photo Album, Douban, Xunlei Kankan, etc. use Nginx servers.

Advantages of Nginx:

High concurrent connections: The official test can support 50,000 concurrent connections, and in the actual production environment, the number of concurrent connections reaches 20,000 to 30,000.

Low memory consumption: Under 30,000 concurrent connections, the 10 Nginx processes started consume only 150M of memory (15M*10=150M).

Built-in health check function: If a web server in the backend of Nginx Proxy goes down, front-end access will not be affected.

Strategy: Compared with the old Apache, we choose Lighttpd and Nginx, web servers with smaller resource usage and higher load capacity.

Mysql:

MySQL itself has a strong load capacity. MySQL optimization is a very complicated task, because it ultimately requires a good understanding of system optimization. Everyone knows that database work involves a large number of short-term queries, reads and writes. In addition to software development techniques such as indexing and improving query efficiency that need to be paid attention to during program development, the main impact on MySQL execution efficiency from the perspective of hardware facilities comes from the disk. Search, disk IO levels, CPU cycles, memory bandwidth.

Perform MySQl optimization based on the hardware and software conditions on the server. The core of MySQL optimization lies in the allocation of system resources, which does not mean allocating more resources to MySQL without limit. In the MySQL configuration file, we introduce some of the most noteworthy parameters:

Change the index buffer length (key_buffer)

Change the table length (read_buffer_size)

Settings open The maximum number of tables (table_cache)

Set a time limit for slow long queries (long_query_time)

If conditions permit, it is generally best to install the MySQL server in the Linux operating system, and Not installed in FreeBSD.
Strategy: MySQL optimization requires formulating different optimization plans based on the database reading and writing characteristics of the business system and the server hardware configuration, and the master-slave structure of MySQL can be deployed as needed.

PHP:

1. Load as few modules as possible;

2. If it is under the windows platform, try to use IIS or Nginx instead of what we usually use. Apache;

3. Install the accelerator (both improve the execution speed of the PHP code by caching the precompiled results of the PHP code and the database results)
eAccelerator, eAccelerator is a free and open source PHP accelerator, optimized and Dynamic content caching improves the caching performance of PHP scripts, so that the overhead on the server when PHP scripts are compiled is almost completely eliminated.

Apc: Alternative PHP Cache (APC) is a free and public optimized code cache for PHP. It is used to provide a free, open and robust framework for caching and optimizing PHP intermediate code.

memcache: memcache is a high-performance, distributed memory object caching system developed by Danga Interactive, which is used to reduce database load and improve access speed in dynamic applications. The main mechanism is to maintain a unified huge hash table in the memory. Memcache can be used to store data in various formats, including images, videos, files and database retrieval results.

Xcache: Developed by Chinese people Cache,

Strategy: Install accelerator for PHP.

Proxy Server (Cache Server):

Squid Cache (referred to as Squid) is a popular free software (GNU General Public License) proxy server and web caching server. Squid has a wide range of uses, from acting as a front-end cache server for web servers to increase the speed of web servers by caching relevant requests, to caching the World Wide Web, Domain Name System, and other web searches for a group of people to share network resources, to helping the network by filtering traffic. Security, to LAN through proxy network. Squid is primarily designed to run on Unix-like systems.

Strategy: Installing Squid reverse proxy server can greatly improve server efficiency.

Stress testing: Stress testing is a basic quality assurance behavior that is part of every important software testing effort. The basic idea of ​​stress testing is simple: instead of running manual or automated tests under normal conditions, you run tests under conditions where the number of computers is small or system resources are scarce. Resources that are typically stress tested include internal memory, CPU availability, disk space, and network bandwidth. Concurrency is generally used for stress testing.
Stress testing tools: webbench, ApacheBench, etc.

Vulnerability testing: Vulnerabilities in our system mainly include: sql injection vulnerabilities, xss cross-site scripting attacks, etc. Security also includes system software, such as operating system vulnerabilities, vulnerabilities in mysql, apache, etc., which can generally be solved through upgrades.

Vulnerability testing tool: Acunetix Web Vulnerability Scanner

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn