Apache Storm distributed messaging system


Apache Storm processes real-time data, and the input usually comes from a message queuing system. An external distributed messaging system will provide the input required for real-time computation. Spout will read data from the messaging system, convert it into tuples and feed it into Apache Storm. Interestingly, Apache Storm uses its own distributed messaging system internally for communication between its nimbus and supervisor.

What is a distributed messaging system?

Distributed messaging is based on the concept of reliable message queues. Messages are queued asynchronously between the client application and the messaging system. Distributed messaging systems provide the benefits of reliability, scalability, and durability.

Most messaging patterns follow the publish-subscribe model (referred to as publish-subscribe), where the sender of a message is called the publisher, And those who want to receive messages are called subscribers.

Once the message has been published by the sender, the subscriber can receive the selected message with the help of filtering options. Usually we have two types of filtering, one is topic-based filtering and the other is content-based filtering.

It should be noted that the pub-sub model can only communicate through messages. It's a very loosely coupled architecture; even senders don't know who their subscribers are. Many messaging patterns enable message brokers to exchange published messages for timely access by many subscribers. A real life example is Dish TV which publishes different channels like Sports, Movies, Music etc. Anyone can subscribe to their own set of channels and get the channels they subscribe to when available.

messaging_system.jpg

The following table describes some of the popular high-throughput messaging systems -

Distributed Messaging SystemsDescription
Apache KafkaKafka was developed at LinkedIn, and later it became a sub-project of Apache. Apache Kafka is based on a brokerenabled, durable, distributed publish-subscribe model. Kafka is fast, scalable and efficient.
RabbitMQRabbitMQ is an open source distributed robust messaging application. It's easy to use and runs on all platforms.
JMS (Java Message Service)JMS is an open source API that supports creating, reading, and sending messages from one application to another. It provides guaranteed messaging and follows a publish-subscribe model.
ActiveMQActiveMQ messaging system is an open source API for JMS.
ZeroMQZeroMQ is agentless peer message processing. It provides push-pull, router-reseller messaging modes.
KestrelKestrel is a fast, reliable, and simple distributed message queue.

Thrift Protocol

Thrift is built on Facebook for cross-language service development and remote procedure calls (RPC). Later, it became an open source Apache project. Apache Thrift is an interface definition language that allows new data types and service implementations to be defined on top of defined data types in an easy way. Apache Thrift is also a communication framework that supports embedded systems, mobile applications, web applications and many other programming languages. Some of the key features associated with Apache Thrift are its modularity, flexibility, and high performance. Furthermore, it can perform streaming, messaging and RPC in distributed applications.

Storm makes extensive use of the Thrift protocol for internal communication and data definition. Storm topology is just

Thrift Structs

. Storm Nimbus is a Thrift service that runs the topology in Apache Storm.