Version: NG-2.14

ContextStreams Dashboard

The ContextStreams monitoring dashboard gives an overall view of the health of the ContextStream pipelines running in the system. It provides an overview of all the ContextStreams applications and a detailed view of the application-wise metrics to help pinpoint the source of an issue, if any.

It provides various information like the number of applications and instances running or failing, CPU and memory usage by each application, the latency and lag of polling from or committing to Kafka, the total number of records processed or dropped, and the count of exceptions encountered.

The ContextStreams monitoring dashboard provides a crucial tool for solutioning engineers to troubleshoot and maintain the health of the ContextStream pipelines within the system. It offers an overarching view of all ContextStreams applications and delves into detailed application-specific metrics to facilitate the identification and resolution of potential issues.

Accessing ContextStreams Dashboard

To access the ContextStreams Dashboard:

Navigate to the left navigation menu and click on Dashboards.
Run a search for the ContextStreams Dashboard.
Click on the ContextStreams Dashboard to access it.

Dashboard’s Panels

The ContextStreams Dashboard is divided into the following sections:

Stream Apps Overview
Gain insights into the health of ContextStream pipelines with metrics on running and failed applications, exceptions, and latency, facilitating quick identification of potential issues.
Resource Usage Metrics
Monitor memory and CPU usage per instance to ensure efficient resource allocation and detect abnormalities, aiding in proactive resource management and optimization.
Stream Metrics
Track processed records, poll rates, and latency to assess data processing efficiency, while monitoring running app instances for insights into pipeline health and performance.
Plugin Metrics
Dive into plugin-level metrics to pinpoint bottlenecks and errors within the processing pipeline, with detailed insights into exception counts and record processing efficiency.
Consumer Metrics
Monitor consumer lag and consumption rates to ensure timely data ingestion and processing, with visualizations of fetch rates and records consumed aiding in performance optimization.
JVM Metrics
Keep an eye on JVM health with metrics on heap memory usage and garbage collection times, enabling proactive management to prevent performance degradation and outages.

At the top of the dashboard, you can apply filters to select specific App IDs and Instance IDs. These filters allow you to focus on particular ContextStream pipelines or instances, aiding in targeted analysis and troubleshooting.

Stream Apps Overview

This section gives an overview of the comprehensive ContextStreams architecture.

Running Apps and Instances: Monitoring the number of running and failed applications and instances provides immediate visibility into any potential system-wide issues. An unexpected drop in the number of running apps or instances could indicate failures or bottlenecks within the system. Details of the Failed Apps and Instances can be checked from the Stream Metrics section.
Exception Count and Record Metrics: Tracking exceptions and the number of dropped records helps pinpoint specific areas of concern within the pipeline. A sudden increase in exception counts or dropped records may indicate issues with data integrity, processing logic, or resource constraints. Plugin-wise Exception details can be found in the Plugin Metrics section.
Latency Visualization: Visualizing poll and process latency allows engineers to identify any delays in data processing. High latency values may indicate performance bottlenecks, network issues, or resource contention, enabling engineers to prioritize troubleshooting efforts accordingly. Poll latency represents the time taken for the pipeline to retrieve records from Kafka, while process latency represents the time taken to process these records.

Resource Usage Metrics

This section gives an overview of the Memory and CPU usage per instance of the selected Stream App.

Memory and CPU Usage: Monitoring memory and CPU usage per instance provides insights into resource utilization patterns. Spikes or sustained high usage levels may indicate memory leaks, inefficient processing logic, or inadequate resource allocation, prompting further investigation and optimization.
Time Series Visualization: Analyzing trends in memory and CPU usage over time enables engineers to detect gradual increases or sudden spikes, facilitating proactive resource management and capacity planning to prevent performance degradation or outages.

Stream Metrics

Processed Records and Polls: Tracking the number of processed records and polling activities helps gauge the efficiency of data ingestion and processing. Discrepancies between expected and actual processing rates may signal issues with data availability, processing logic, or resource constraints.
Running App Instances: Monitoring the status of running app instances provides insights into the health and availability of individual pipelines. Instances experiencing errors or failures may require immediate attention to prevent data loss or service disruptions.
Latency and Rate Visualization: Visualizing end-to-end latency, poll rates, process rates, and commit latency enables engineers to identify performance bottlenecks and optimize data processing workflows. Deviations from expected latency or throughput levels may indicate underlying issues requiring investigation and remediation.

Plugin Metrics

Plugin-Level Monitoring: Monitoring plugin metrics allows engineers to pinpoint specific components or stages within the data processing pipeline experiencing performance issues or errors. Identifying plugins with high latency, exception counts, or dropped records helps prioritize troubleshooting efforts and optimize processing logic.
Exception Counts and Record Metrics: Tracking exception counts and record processing metrics at the plugin level provides granular insights into the health and efficiency of individual processing stages. Anomalies or discrepancies in exception counts or record processing rates may indicate plugin-specific issues requiring targeted investigation and resolution.

Consumer Metrics

Consumer Lag and Consumption Rates: Monitoring consumer lag and consumption rates helps ensure timely data ingestion and processing. Detecting spikes in consumer lag or fluctuations in consumption rates allows engineers to identify potential bottlenecks, resource constraints, or data availability issues impacting pipeline performance.
Fetch and Consumption Rate Visualization: Visualizing fetch rates and records consumed rates over time enables engineers to assess the efficiency of data retrieval and consumption processes. Deviations from expected fetch or consumption rates may indicate network issues, resource contention, or inefficient data processing workflows requiring optimization.

JVM Metrics

Heap Memory Usage and Garbage Collection: Monitoring JVM metrics such as heap memory usage and garbage collection times (Young and Old) helps ensure optimal resource utilization and stability. Sudden increases in memory usage or prolonged garbage collection times may indicate memory leaks, inefficient resource management, or garbage collection tuning issues requiring attention and optimization.
Visualization of JVM Metrics: Visualizing JVM metrics over time enables engineers to detect trends, anomalies, or patterns indicative of underlying issues impacting system performance and stability. Proactively monitoring and analyzing JVM metrics facilitates timely intervention and optimization to prevent performance degradation or outages.

Conclusion

In conclusion, the ContextStreams Dashboard serves as a comprehensive tool for the Solution Engineers to effectively monitor, troubleshoot, and optimize ContextStream pipelines within the system. By providing detailed insights into application health, resource usage, data processing metrics, plugin performance, consumer behavior, and JVM health, this dashboard equips engineers with the necessary visibility and understanding to swiftly identify and address any issues that may arise. With its user-friendly interface and rich visualizations, the ContextStreams Dashboard empowers engineers to proactively manage system performance, ensure data integrity, and maintain the reliability of ContextStream pipelines, ultimately contributing to the seamless operation of the platform.

Kafka Cluster Monitoring

The Kafka Cluster Monitoring dashboard gives an overview of the Kafka Cluster service running for vuSmartMaps. The majority of data streaming and processing depends on the smooth functioning of the Kafka cluster, hence this dashboard provides a detailed view of the performance and functionality of the cluster. It shows information about the CPU, disk, and memory utilization, and data metrics like the rate of data being read and written to Kafka.