Monitoring a topology using the Storm UI - Create real-time stream processing applications with

This section covers how we can monitor the Storm cluster through the Storm UI. Let's first start with the definition of monitoring. Monitoring is used to track the health of various components that are running in a cluster. The statistics or information collected through monitoring is used by an administrator to spot an error or bottleneck in a cluster. The Storm UI daemon provides the following important information:

• Cluster Summary: This portion of the Storm UI shows the version of Storm deployed in a cluster, uptime of the nimbus node, number of free worker slots, number of used worker slots, and so on. While submitting a topology to the cluster, the user first needs to make sure that the value of the Free slots column should not be zero; otherwise, the topology doesn't get any worker for processing and will wait in the queue till a worker becomes free.

• Nimbus Configuration: This portion of the Storm UI shows the configuration of the Nimbus node.

• Supervisor summary: This portion of the Storm UI shows the list of supervisor nodes running in the cluster along with their Id, Host, Uptime, Slots, and Used slots columns.

• Topology summary: This portion of the Storm UI shows the list of topologies running in the Storm cluster along with their ID, number of workers assigned to the topology, number of executors, number of tasks, uptime, and so on.

Let's deploy the sample topology (if not running already) in a remote Storm cluster by running the following command:

bin/storm jar $STORM_PROJECT_HOME/target/storm-example-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.learningstorm.storm_example.

LearningStormSingleNodeTopology LearningStormClusterTopology

As mentioned in the Configuring parallelism at the code level section of Chapter 2, Setting Up a Storm Cluster, we created the LearningStormClusterTopology topology by defining three worker processes, two executors for LearningStormSpout, and four executors for LearningStormBolt.

After submitting LearningStormClusterTopology on the Storm cluster, the user has to refresh the Storm home page.

The following screenshot shows that the row is added for

LearningStormClusterTopology in the Topology summary section.

The topology section contains the name of the topology, unique ID of the topology, status of the topology, uptime, number of workers assigned to the topology, and so on. The possible values of status fields are ACTIVE, KILLED, and INACTIVE.

The home page of the Storm UI after deploying the sample topology

Let's click on LearningStormClusterTopology to view its detailed statistics. This shown in the following screenshot:

The statistics of LearningStormClusterTopology

The preceding screenshot shows the statistics of the bolt and spout running in LearningStormClusterTopology. The screenshot contains the following major sections:

• Topology actions: This section allows us to activate, deactivate, rebalance, and kill the topology's functionality directly through the Storm UI.

• Topology stats: This section will give the information about the number of tuples emitted, transferred, and acknowledged, the capacity latency, and so on, within the window of 10 minutes, 3 hours, 1 day, and since the start of the topology.

• Spouts (All time): This section shows the statistics of all the spouts running inside a topology. The following is the major information about a spout:

° Executors: This column gives details about the number of executors assigned to LearningStormSpout. The value of the number of executors is two for LearningStormSpout because we have started LearningStormClusterTopology by assigning two executors for LearningStormSpout.

° Tasks: This column gives details about the number of tasks assigned to LearningStormSpout. As explained in Chapter 2, Setting Up a Storm Cluster, the tasks will run inside the executors, and if we don't specify the tasks, then Storm will automatically assign one task per executor.

Hence, the number of tasks of LearningStormSpout is equal to the number of executors assigned to LearningStormSpout.

° Emitted: This column gives details about the number of records emitted all time by LearningStormSpout.

° Port: This column defines the worker port assigned to LearningStormSpout.

° Transferred: This column gives details about the number of records transferred all time by LearningStormSpout.

° Complete latency (ms): This column gives the complete latency of a tuple. The complete latency is the difference in the timestamp when the spout emits the tuple to the timestamp when the ACK tree is completed for the tuple.

The difference between the emitted and transferred records is that the term emitted signifies the number of times the emit method of the OutputCollector class is called. On the other hand, the term transferred signifies the number of tuples actually sent to other tasks.

For example, the bolt Y has two tasks and subscribes to the bolt X using the all grouping type, then the value of emitted and transferred records is 2x for the bolt X. Similarly, if the bolt X emits the stream for which no one is subscribed to, then the value of transferred is zero.

• Bolts (All time): This section shows the statistics of all the bolts running inside a topology. Here is some important information about a bolt:

° Executors: This column gives details about the number of executors assigned to LearningStormBolt. The value of the number of executors is four for LearningStormBolt because we have started LearningStormClusterTopology by assigning four executors to LearningStormBolt.

° Tasks: This column gives the details about the number of tasks assigned to LearningStormBolt. As explained in Chapter 2, Setting Up a Storm Cluster, the tasks will run inside the executors, and if we don't specify the tasks, then Storm will automatically assign one task per executor. Hence, the number of tasks of LearningStormBolt is equal to the number of executors assigned to LearningStormBolt. ° Emitted: This column gives the details about the number of records

emitted all time by LearningStormBolt.

° Port: This column defines the worker port assigned to LearningStormBolt.

° Transferred: This column gives the details about the number of records transferred all time by LearningStormBolt.

° Capacity (last 10m): The capacity metric is very important to monitor the performance of the bolt. This parameter gives an overview of the percent of the time spent by the bolt in actually processing tuples in the last 10 minutes. If the value of the Capacity (last 10m) column is close to 1, then the bolt is at capacity, and we will need to increase the parallelism of the bolt to avoid an

"at capacity" situation. An "at capacity" situation is a bottleneck for the topology because if spouts start emitting tuples at a faster rate, then most of the tuples will timeout and spout will need to re-emit the tuples into the pipeline.

° Process latency (ms): Process latency means the actual time (in milliseconds) taken by the bolt to process a tuple.

° Execute latency (ms): Execute latency is the sum of the processing time and the time used in sending the acknowledgment.

Let's click on the LearningStormSpout link to view the detailed statistics of a spout, as shown in the following screenshot:

The statistics of LearningStormSpout

The preceding screenshot shows that the tasks of LearningStormSpout are assigned to two executors. The screenshot also shows that the first executor is assigned to the supervisor1 machine and the second one is assigned to the supervisor2 machine.

Now, let's go to the previous page of the Storm UI and click on the LearningStormBolt link to view detailed statistics for the bolt, as shown in the following screenshot:

The statistics of LearningStormBolt

The preceding screenshot shows that the tasks of LearningStormBolt are assigned to four executors. The screenshot also shows that the one executor is assigned to the supervisor1 machine and the remaining three executors are assigned to the supervisor2 machine. The Input stats (All time) section of the bolt shows the source of tuples for LearningStormBolt; in our case, the source

is LearningStormSpout.

Again, go to the previous page and click on the Kill button to stop the topology.

While killing the topology, Storm will first deactivate the spouts and wait for the kill time mentioned on the alerts box, so the bolts have a chance to finish the processing of the tuples emitted by spouts before the kill command. The following screenshot shows how we can kill the topology through the Storm UI:

Killing a topology

Let's go to the Storm UI's home page to check the status of

LearningStormClusterToplogy, as shown in the following screenshot:

The status of LearningStormClusterTopology

Cluster statistics using the Nimbus

在文檔中 Create real-time stream processing applications with Apache Storm (頁 71-78)