Apache Storm core concepts
Apache Storm reads a raw stream of real-time data from one end and passes it through a series of small processing units and outputs processed/useful information at the other end.
The following figure describes the core concepts of Apache Storm.

Now let’s take a closer look at the components of Apache Storm -
Let’s look at a real-time example of “Twitter Analytics” to see how it is modeled in Apache Storm. The diagram below depicts the structure.

The input to "Twitter Analytics" comes from the Twitter Streaming API. Spout will use the Twitter Streaming API to read the user's tweets and output them as a stream of tuples. A single tuple from the spout will have the twitter username and a single tweet as comma separated values. The steam of this tuple will then be forwarded to Bolt, and Bolt splits the tweet into individual words, counts the word count, and saves the information to the configured data source. Now we can easily get the results by querying the data source.
Topology
Spouts and Bolts are connected together to form a topology. Real-time application logic is specified in the Storm topology. Simply put, a topology is a directed graph where vertices are computations and edges are data flows.
Simple topology starts with spouts. Spouts emit data to one or more Bolts. Bolts represent nodes in the topology with minimal processing logic, and the output of a Bolt can be emitted to another Bolt as input.
Storm keeps the topology running until you terminate it. The main job of Apache Storm is to run topologies, and to run any number of topologies at a given time.
Quest
Now you have a basic idea about Spouts and Bolts. They are the smallest logical unit of the topology, and the topology is built using a single Spout and Bolt array. They should be executed correctly in a specific order for the topology to run successfully. Each spout and bolt executed by Storm is called a "task". Simply put, a task is the execution of Spouts or Bolts. Each spout and bolt can have multiple instances running in multiple separate threads at a given time.
Processes
The topology runs in a distributed fashion on multiple worker nodes. Storm distributes tasks evenly across all worker nodes. The role of a worker node is to listen for jobs and start or stop processes when new jobs arrive.
Stream grouping
Data flows from Spouts to Bolts, or from one Bolts to another Bolts. Flow grouping controls how tuples are routed in the topology and helps us understand the flow of tuples in the topology. There are four built-in groupings, described below.
Random Grouping
In random grouping, an equal number of tuples are randomly distributed among all workers executing Bolts. The diagram below depicts the structure.

Field grouping
Fields with the same value in a tuple are grouped together, and the remaining tuples are saved externally. Tuples with the same field values are then sent forward to the same process executing Bolts. For example, if the stream is grouped by the field "word", tuples with the same string "Hello" will be moved to the same worker. The following diagram shows how field grouping works.

Global Grouping
All streams can be grouped and forwarded to one Bolts. This grouping sends tuples generated by all instances of the source to a single target instance (specifically, the worker with the lowest ID is selected).

All Grouping
All Grouping sends a single copy of each tuple to all instances of receiving Bolts. This grouping is used to send signals to Bolts. All groupings are useful for join operations.
