Home  >  Article  >  Backend Development  >  Design of high concurrent writing to mysql

Design of high concurrent writing to mysql

WBOY
WBOYOriginal
2016-08-04 09:19:371507browse

Recently developed a project. The client submits 100 rows of data to the server every 10 seconds, and the server writes them after checking for duplication.
The number of clients is about tens of thousands, and the submitted data is relatively concentrated, and the problem of reading data is not considered.
The current design is:
The database is divided into tables according to clients. The amount of data in each table is not high.
After the server obtains the data, it first inserts it into the redis queue, and then inserts it into the database through a scheduled task.
The question is:
1. Can the interface provided by the server to the client satisfy thousands of clients posting data at the same time (the client submits once every 10 seconds)?
2. First save the data in the redis queue. If there are tens or millions of data, is redis stable?
The basic goal is to ensure that the server can provide services normally.

------------------------ Supplementary content------------------------- ------
The project mainly collects user data. It will run automatically when you turn on the computer.
Submit 100 items each time, once every 10 seconds. Generally, users submit within 10 times a day, that is, within 1,000 pieces of data.
Each piece of data contains five or six value pairs, within 100 characters.
It is necessary to ensure the integrity of daily data. There will be situations where multiple clients collect the same user data, so duplication needs to be avoided.

Now consider this:
The data table is divided into tables by user.
The data submitted by the user is first saved in the redis queue according to the user, that is, each user has one queue per day. After being saved to the database, the queue is deleted.

Reply content:

Recently developed a project. The client submits 100 rows of data to the server every 10 seconds, and the server writes them after checking for duplication.
The number of clients is about tens of thousands, and the submitted data is relatively concentrated, and the problem of reading data is not considered.
The current design is:
The database is divided into tables according to clients. The amount of data in each table is not high.
After the server obtains the data, it first inserts it into the redis queue, and then inserts it into the database through a scheduled task.
The question is:
1. Can the interface provided by the server to the client satisfy thousands of clients posting data at the same time (the client submits once every 10 seconds)?
2. First save the data in the redis queue. If there are tens or millions of data, is redis stable?
The basic goal is to ensure that the server can provide services normally.

------------------------ Supplementary content------------------------- ------
The project mainly collects user data. It will run automatically when you turn on the computer.
Submit 100 items each time, once every 10 seconds. Generally, users submit within 10 times a day, that is, within 1,000 pieces of data.
Each piece of data contains five or six value pairs, within 100 characters.
It is necessary to ensure the integrity of daily data. There will be situations where multiple clients collect the same user data, so duplication needs to be avoided.

Now consider this:
The data table is divided into tables by user.
The data submitted by the user is first saved in the redis queue according to the user, that is, each user has one queue per day. After being saved to the database, the queue is deleted.

  1. Merge insertion, do not insert one item at a time. For example, for the same insertion operation, merge 1000 insertions. This can reduce the number of interactions

  2. If this table only has simple insert and query operations and does not require transaction support, you can consider using the MyISAM engine. Compared with InnoDB, you can get higher performance during insertion

First, there are several considerations

  1. Is the bandwidth enough?

  2. CPU number, if there are 4 cores and the number of php-fpm is also 4, each request requires 50-150ms processing time. Calculate the approximate number of requests processed within the duration.

  3. Memory, one process takes up 10-25M of memory.

Things that can be considered include: load balancing and dns polling. Also pay attention to the high availability of the cluster.

Second, there are several considerations

  1. Data row, what is the length of a row? Redis will have performance degradation above 1k.

  2. Processing speed, how much data will be accumulated in the queue, and how much memory it takes up

  3. Redis architecture, how to ensure that data is not lost and how to achieve high availability

  4. Whether the current resources allow this solution and whether there are other solutions.

Can we write concurrently? Then use active-active active-active mode to reduce concurrent writing pressure by 50%

Use MyCat

You can do database sharding, consistent hashing or simple ID interval hashing, which should be enough. If it feels troublesome, separate reading and writing and look at the load first

Try using queue?

The questioner said that data generation is relatively concentrated... Then you can consider using queue tasks to slightly extend the concentrated task period... Try to smooth the writing... You need to consider the write-read delay and smoothing processing time Just find a reasonable balance point... If there is really no room for compromise, just use the high-end approach mentioned above... In addition, if you don't want to mess with the database, you can also try writing to the dump file first... Another Packaged import...I don’t know if this is considered a wild road...

-1. Submitting 100 items at a time and processing them in 10 seconds is obviously relatively urgent. I assume that your data is allowed to be partially lost, and you can consider caching on the client (caching the data on the client is actually a A risky approach), for example, if I have 200 items, I submit them once every 20 seconds.
-2. The server can use task queues to reduce server blocking and thereby improve concurrency. (Submitted once every 10 seconds, high concurrency is easy to occur)

-3. In addition, you should consider whether the data is frequently read and written. Otherwise, it is recommended to have ehcache. Cluster synchronization will bring additional expenses.

-4. Such a special business must not share a server with other businesses.

-5. Regarding how to divide the tables later, it depends on your business.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn