Apache Storm cluster architecture


The main highlight of Apache Storm is that it is a fault-tolerant, fast, and no "single point of failure" (SPOF) distributed application. We can install Apache Storm in multiple systems as needed to increase the capacity of the application.

Let's take a look at how the Apache Storm cluster is designed and its internal architecture. The following diagram depicts the cluster design.

core_concept.jpg

Apache Storm has two types of nodes, Nimbus (master node) and Supervisor (worker node). Nimbus is a core component of Apache Storm. The main job of Nimbus is to run the Storm topology. Nimbus analyzes the topology and collects tasks to perform. It then assigns tasks to available supervisors.

Supervisor will have one or more worker processes. Supervisor delegates tasks to worker processes. The worker process will spawn as many executors and run tasks as needed. Apache Storm uses an internal distributed messaging system for communication between Nimbus and the hypervisor.

ComponentDescription
Nimbus (master node)Nimbus is a Storm cluster the master node. All other nodes in the cluster are called worker nodes. The master node is responsible for distributing data among all worker nodes, assigning tasks to worker nodes and monitoring failures.
Supervisor (worker node) Nodes that follow instructions are called Supervisors. Supervisor has multiple worker processes, and it manages the worker processes to complete tasks assigned by nimbus.
Worker process The worker process will perform tasks related to a specific topology. Instead of running tasks themselves, worker processes create executors and ask them to perform specific tasks. Worker processes will have multiple executors.
ExecutorThe executor is just a single thread generated by the worker process. An executor runs one or more tasks, but only for a specific spout or bolt.
Task Task performs the actual data processing. So, it is a spout or bolt.
ZooKeeper framework (ZooKeeper framework)

Apache's ZooKeeper is a cluster (node ​​group) that uses itself and maintains shared data between itself and has powerful synchronization technology Coordinated services. Nimbus is stateless, so it relies on ZooKeeper to monitor the status of worker nodes.

ZooKeeper helps supervisor interact with nimbus. It is responsible for maintaining the status of nimbus and supervisor.

Storm is stateless. Even though the stateless nature has its own drawbacks, it actually helps Storm handle real-time data in the best possible and fastest way.

Although Storm is not completely stateless. It stores its state in Apache ZooKeeper. Since the state is available in Apache ZooKeeper, a failed network can be restarted and work from where it left off. Typically, a service monitoring tool like monit will monitor Nimbus and restart it in case of any failure.

Apache Storm also has an advanced topology called Trident topology, which has state maintenance and also provides a high-level API like Pig. We will discuss all these features in the following chapters.