In the process of distributed large-batch data collection, the management of information sources is particularly important. In order to ensure that the same task can only be processed by one collector at the same time, the uniqueness of task scheduling must be ensured. Usually when we carry out distributed data collection, there will usually be a scheduling module, whose main responsibility is to distribute the collection tasks and ensure the uniqueness of the tasks.
Because it is distributed, it involves multiple servers (multiple machines), each server involves multiple collectors (multiple processes), and each collector may involve multiple threads. , Therefore, the lock mechanism in the task scheduling module is particularly important. Depending on the implementation architecture of the application, lock implementation methods can usually be divided into the following types
If the handler is single-process and multi-threaded, under python, you can Use the Lock object of the threading module to restrict synchronous access to shared variables to achieve thread safety.
In the case of single machine and multiple processes, under python, you can use the Lock object of multiprocessing to handle it.
In the case of multi-machine and multi-process deployment, you have to rely on a third-party component (storage lock object) to implement a distributed synchronization lock.
Since the scheduling module is a multi-machine, multi-process, and multi-thread processing mechanism, it is consistent with the third method.
Distributed lock implementation methods
The current mainstream distributed lock implementation methods are as follows:
Based on database, such as mysql
Based on cache, such as redis
Based on zookeeper
Each implementation method has its own merits. After comprehensive consideration, Redis is the most suitable choice. The main reason is:
redis operates based on memory, and the access speed is faster than the database. Under high concurrency, the performance after locking will not drop too much
redis can set the survival time (TTL) of key values
redis is simple to use and has low overall implementation overhead
However, the distributed lock implemented using redis also needs to meet the following conditions:
Only one thread can occupy the lock at the same time. Other threads must wait until the lock is released
The lock operation must satisfy atomicity
No deadlock will occur, such as when the lock has been acquired The thread suddenly exits abnormally before releasing the lock, causing other threads to wait in a loop for the lock to be released
The addition and release of the lock must be set by the same thread
We use redis to implement a distributed synchronization lock to ensure data consistency, which needs to meet the following characteristics:
Satisfy mutual exclusivity, only one thread can acquire the lock at the same time
Use the ttl of redis to ensure that no deadlock will occur, but it will also cause problems due to lock expiration The problem of multiple threads occupying locks at the same time requires us to set the expiration time of the lock reasonably to avoid
Use the uniqueness of the lock to ensure that the lock will not be accidentally deleted
In the actual operation process, I separated the scheduling module from the entire collection system, based on the Java client Jredis (JRedis is a high-end A high-performance Java client used to connect to the Redis distributed hash key-value database. An independent service that uses Spring Boot to implement synchronous and asynchronous functions. It allows other collectors to request the collection tasks to be processed through HTTP. .The processing process is roughly as follows:
The collector sends a task request to the dispatching center through HTTP;
The dispatching center determines whether the lock exists , if it exists, the empty set will be returned directly;
If the lock does not exist, the request will be locked, and then the corresponding collection task will be obtained according to the source rules;
Return the acquired task (if there is no pending task, return empty), and then delete the lock.
The code implementation of the scheduling module is roughly as follows:
to the lock. Otherwise, if some unknown exception occurs, the lock may not be released and the collector will never be able to obtain the collection task.public static List
HashServiceInterface hif, ZSetServiceInterface zScoreSet, String dicName) {
List
try {
String dicNameLock = "Dispatcher_Task_Lock";// Task scheduling lock;
if (! redisHashUtils.keyIsExit(dicNameLock, lockKeyValue)) {//Determine whether the lock exists
//Add a lock (write the task uniqueness identifier into the record);
redisHashUtils.addOneData(dicNameLock, lockKeyValue) ,
DateUtil.getYMDHMS());
// Processing task logic
.......
’'’’'’’’’’’’’’’’’’s’ one’s ’’’’’’ out’s out out out out out out out out out out out out out outs’s of's
Sorry, you did not provide the original words that need to be rewritten, and rewriting cannot be performed else { //The lock already exists System.out.println("Processing task, Temporarily return the empty collection....");Sorry, you did not provide the original words that need to be rewritten, so rewriting cannot be done } catch ( Exception e) {e.printStackTrace(); }return result;}During the actual operation, When adding a lock, you must add an
expiration time
The above is the detailed content of How to implement task scheduling based on Redis distributed lock. For more information, please follow other related articles on the PHP Chinese website!

RedisofferssuperiorspeedfordataoperationsbutrequiressignificantRAMandinvolvestrade-offsindatapersistenceandscalability.1)Itsin-memorynatureprovidesultra-fastread/writeoperations,idealforreal-timeapplications.2)However,largedatasetsmaynecessitatedatae

Redisoutperformstraditionaldatabasesinspeedforread/writeoperationsduetoitsin-memorynature,whiletraditionaldatabasesexcelincomplexqueriesanddataintegrity.1)Redisisidealforreal-timeanalyticsandcaching,offeringphenomenalperformance.2)Traditionaldatabase

UseRedisinsteadofatraditionaldatabasewhenyourapplicationrequiresspeedandreal-timedataprocessing,suchasforcaching,sessionmanagement,orreal-timeanalytics.Redisexcelsin:1)Caching,reducingloadonprimarydatabases;2)Sessionmanagement,simplifyingdatahandling

Redis goes beyond SQL databases because of its high performance and flexibility. 1) Redis achieves extremely fast read and write speed through memory storage. 2) It supports a variety of data structures, such as lists and collections, suitable for complex data processing. 3) Single-threaded model simplifies development, but high concurrency may become a bottleneck.

Redis is superior to traditional databases in high concurrency and low latency scenarios, but is not suitable for complex queries and transaction processing. 1.Redis uses memory storage, fast read and write speed, suitable for high concurrency and low latency requirements. 2. Traditional databases are based on disk, support complex queries and transaction processing, and have strong data consistency and persistence. 3. Redis is suitable as a supplement or substitute for traditional databases, but it needs to be selected according to specific business needs.

Redisisahigh-performancein-memorydatastructurestorethatexcelsinspeedandversatility.1)Itsupportsvariousdatastructureslikestrings,lists,andsets.2)Redisisanin-memorydatabasewithpersistenceoptions,ensuringfastperformanceanddatasafety.3)Itoffersatomicoper

Redis is primarily a database, but it is more than just a database. 1. As a database, Redis supports persistence and is suitable for high-performance needs. 2. As a cache, Redis improves application response speed. 3. As a message broker, Redis supports publish-subscribe mode, suitable for real-time communication.

Redisisamultifacetedtoolthatservesasadatabase,server,andmore.Itfunctionsasanin-memorydatastructurestore,supportsvariousdatastructures,andcanbeusedasacache,messagebroker,sessionstorage,andfordistributedlocking.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 English version
Recommended: Win version, supports code prompts!

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver CS6
Visual web development tools
