Skip to main content
Version: NG-2.14

Kafka

Introduction

Kafka monitoring involves collecting and analyzing metrics related to the performance and health of Apache Kafka clusters. By monitoring these metrics, administrators can identify bottlenecks, optimize resource usage, detect potential issues like lag or broker failure, and ensure smooth and reliable message streaming and processing. This proactive approach helps in maintaining the overall efficiency, scalability, and stability of the Kafka environment.

Getting Started

Compatibility

The Kafka O11ySource is designed to work with all versions greater than or equal to 7, and it has been tested with Kafka 7.3.

Data Collection Method

The Kafka O11ySource is configured to collect various kinds of metrics related to Kafka Broker, Kafka Consumer Group or Kafka Zookeeper Metrics both in standalone & cluster mode.

vuSmartMaps uses vumetric agent to collect Kafka Broker, Kafka Consumer Group or Kafka Zookeeper Metrics.

Prerequisites

Inputs for Configuring Data Source

  • Instance Name: Please enter the name of the Kafka Package instance. This should be a unique identifier for the specific Kafka Cluster Package deployment you want to monitor.
  • Kafka Cluster ID: Cluster ID for which the package is being created
  • Package Type: Select the package type that needs to be deployed
  • Kafka Script Path: Path of the Kafka Scripts(Eg. /bin/)
  • Kafka Host: IP Address on which Kafka Broker is exposed
  • Kafka Port: Port on which Kafka Broker is exposed
  • Kafka Broker ID: Enter a name to uniquely identify kafka broker instance
  • Zookeeper Host: IP Address on which Zookeeper Runs
  • Zookeeper Port: Port on which Zookeeper is exposed
  • Zookeeper Keeper ID: Enter a name to uniquely identify zookeeper instance
  • Jolokia URL: Enter Jolokia Url to fetch Kafka metrics
  • Polling Interval [seconds]: How frequently data is gathered. interval should be between 60 - 86400 seconds

Firewall Requirement

To collect data from this O11ySource, ensure the following ports are opened:

Source IPDestination IPDestination PortProtocolDirection
vuSmartMaps IPIP address of the Kafka server9092, 2181, 8778*TCPOutbound
IP address of the Kafka servervuSmartMaps Kafka Broker IP9092*TCPInbound

*Before providing the firewall requirements, please update the port based on the customer environment.

Configuring the Target

Configure Metrics Collection from Kafka Server

  • On each Kafka instance, Port 9092 should be open for external requests. The following metrics will be collected from running kafka instance
  • *Partiton Metrics - Crucial for monitoring the performance and health of topics within a cluster. Key metrics include under-replicated partitions, which indicate potential data loss risks, and messages in/out rates, which help assess producer and consumer performance. Regularly tracking these metrics ensures effective load balancing and optimal data availability.
  • *Consumer Group Metrics - Provide insights into the performance and behavior of consumer applications. Key metrics include consumer lag, which measures the difference between the latest message offset and the last consumed offset, helping to identify whether consumers are keeping up with producers. Other important metrics include the rate of messages consumed and the time taken for processing, which help assess the efficiency and responsiveness of consumer groups. Monitoring these metrics ensures that consumer applications operate smoothly and can handle the expected workload effectively.
  • On each Kafka instance, Port 8778 should be open for external requests with jolokia metrics enabled.
  • *Jolokia metrics for a Kafka server monitor essential performance indicators, including memory usage, garbage collection counts, and data throughput metrics like BytesInPerSec and BytesOutPerSec. Key metrics also include consumer and producer request rates, under-replicated partitions, and leader election rates, providing a comprehensive view of the broker's health and operational efficiency.
  • On each Zookeeper instance, Port 2181 should be open for external requests with jolokia metrics enabled.
  • *Includes connection statistics like num_alive_connections and packet counts (packets_received, packets_sent), which assess communication load. It also tracks znode and node counts to understand the data structure complexity, alongside latency metrics (latency_min, latency_max, latency_avg) for evaluating server response times. Additionally, file descriptor metrics offer insights into resource utilization, ensuring optimal performance in a distributed environment.

Configuration Steps

  • Enable the O11ySource.
  • Select the Sources tab and press the + button to add Kafka instance details that has to be monitored.
  • Set up metrics collection configurations which include cluster id & type of package.The O11ysource already configured with metrics to collect, though you have the flexibility to adjust metric collection intervals. Afterwards, select Save and Continue to proceed with downloading the Healthbeat agent.
  • The following packages will be available for download based on the OS <Healthbeat full install package> - Downloads the full Healthbeat agent package with required configurations for a fresh installation <Healthbeat config update package> - Healthbeat the agent configuration package to update an existing Logbeat installation
  • Download the agent installation or update package, then click Finish to close the data source window.

Metrics Collected

NameDescriptionData Type
timestampDetailed timestampDateTime64
tenant_idTenant IDLowCardinality(String)
bu_idBusiness unit IDLowCardinality(String)
hostHostname of the serverLowCardinality(String)
targetTarget server or serviceLowCardinality(String)
cluster_idCluster IDLowCardinality(String)
broker_idBroker IDLowCardinality(String)
sub_typeSubtype of the metricLowCardinality(String)
no_of_brokersNumber of brokersUInt64
brokerStatusStatus of the brokerLowCardinality(String)
kafka_broker_idKafka broker IDUInt64
kafka_host_addressKafka host addressLowCardinality(String)
timestampDetailed timestampDateTime64
tenant_idTenant IDLowCardinality(String)
bu_idBusiness unit IDLowCardinality(String)
hostHostname of the serverLowCardinality(String)
targetTarget server or serviceLowCardinality(String)
sub_typeSubtype of the metricLowCardinality(String)
cluster_idCluster IDLowCardinality(String)
kafka_broker_idKafka broker IDUInt64
kafka_broker_addressKafka broker addressLowCardinality(String)
kafka_topic_nameKafka topic nameLowCardinality(String)
kafka_partition_topic_broker_idKafka partition topic broker IDLowCardinality(String)
kafka_partition_partition_is_leaderIndicates if the partition is leaderBoolean
kafka_partition_partition_insync_replicaIndicates if the partition is an in-sync replicaBoolean
kafka_partition_partition_leaderLeader of the partitionUInt64
kafka_partition_partition_replicaReplica of the partitionUInt64
kafka_partition_offset_newestNewest offset in the partitionUInt64
kafka_partition_offset_oldestOldest offset in the partitionUInt64
kafka_partition_idPartition IDUInt64
kafka_partition_topic_idKafka partition topic IDLowCardinality(String)
kafka_consumergroup_offsetConsumer group offsetInt64
kafka_consumergroup_metaConsumer group metadataString
kafka_consumergroup_consumer_lagConsumer group lagUInt64
kafka_consumergroup_error_codeConsumer group error codeUInt64
kafka_consumergroup_client_hostConsumer group client hostLowCardinality(String)
kafka_consumergroup_client_member_idConsumer group client member IDString
kafka_consumergroup_client_idConsumer group client IDString
kafka_consumergroup_idConsumer group IDString
timestampDetailed timestampDateTime64
tenant_idTenant IDLowCardinality(String)
bu_idBusiness unit IDLowCardinality(String)
hostHostname of the serverLowCardinality(String)
targetTarget server or serviceLowCardinality(String)
cluster_idCluster IDLowCardinality(String)
messageMessage detailsString
GROUPConsumer groupString
LAGLag of the consumer groupUInt64
cOFFSETCurrent offsetUInt64
eOFFSETEnd offsetUInt64
cnIDConsumer node IDString
HOSTHost of the consumerString
TOPICTopic nameString
PARTITIONPartition numberUInt64
clIDClient IDString
timestampDetailed timestampDateTime64
tenant_idTenant IDLowCardinality(String)
bu_idBusiness unit IDLowCardinality(String)
hostHostname of the serverLowCardinality(String)
targetTarget server or serviceLowCardinality(String)
cluster_idCluster IDLowCardinality(String)
broker_idBroker IDLowCardinality(String)
typeType of the metricLowCardinality(String)
sub_typeSubtype of the metricLowCardinality(String)
metric_typeType of the metricLowCardinality(String)
mbeansMBean informationString
CollectionCountCount of collectionsUInt64
CollectionCount_diffCollection count differenceUInt64
CollectionTimeTime spent in collectionUInt64
CollectionTime_diffCollection time differenceUInt64
topicTopic nameString
bytes_per_secBytes per secondFloat64
messages_per_secMessages per secondFloat64
jolokia_metrics_Threading_ThreadCountThread count in Threading metricsUInt64
jolokia_metrics_Threading_TotalStartedThreadCountTotal started thread count in Threading metricsUInt64
jolokia_metrics_Threading_PeakThreadCountPeak thread count in Threading metricsUInt64
jolokia_metrics_Threading_DaemonThreadCountDaemon thread count in Threading metricsUInt64
jolokia_metrics_broker_IsrShrinks_OneMinuteRateISR shrinks one minute rateFloat64
jolokia_metrics_broker_IsrShrinks_FiveMinuteRateISR shrinks five minute rateFloat64
jolokia_metrics_broker_ActiveControllerCountActive controller countUInt64
jolokia_metrics_broker_DelayedOperationPurgatory_ProduceDelayed operation purgatory for produceUInt64
jolokia_metrics_broker_DelayedOperationPurgatory_FetchDelayed operation purgatory for fetchUInt64
jolokia_metrics_broker_GlobalPartitionCountGlobal partition countUInt64
jolokia_metrics_broker_TotalTimeMs_Produce_50thPercentileTotal time for produce (50th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_Produce_MeanMean total time for produceFloat64
jolokia_metrics_broker_TotalTimeMs_Produce_75thPercentileTotal time for produce (75th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_Produce_MinMinimum total time for produceUInt64
jolokia_metrics_broker_TotalTimeMs_Produce_95thPercentileTotal time for produce (95th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_Produce_99thPercentileTotal time for produce (99th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_Produce_MaxMaximum total time for produceUInt64
jolokia_metrics_broker_TotalTimeMs_Produce_CountCount of total time for produceFloat64
jolokia_metrics_broker_TotalTimeMs_FetchConsumer_MinMinimum total time for fetch consumerUInt64
jolokia_metrics_broker_TotalTimeMs_FetchConsumer_95thPercentileTotal time for fetch consumer (95th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_FetchConsumer_99thPercentileTotal time for fetch consumer (99th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_FetchConsumer_MaxMaximum total time for fetch consumerUInt64
jolokia_metrics_broker_TotalTimeMs_FetchConsumer_CountCount of total time for fetch consumerFloat64
jolokia_metrics_broker_TotalTimeMs_FetchConsumer_50thPercentileTotal time for fetch consumer (50th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_FetchConsumer_MeanMean total time for fetch consumerFloat64
jolokia_metrics_broker_TotalTimeMs_FetchConsumer_75thPercentileTotal time for fetch consumer (75th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_FetchFollower_99thPercentileTotal time for fetch follower (99th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_FetchFollower_MaxMaximum total time for fetch followerUInt64
jolokia_metrics_broker_TotalTimeMs_FetchFollower_CountCount of total time for fetch followerUInt64
jolokia_metrics_broker_TotalTimeMs_FetchFollower_50thPercentileTotal time for fetch follower (50th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_FetchFollower_MeanMean total time for fetch followerFloat64
jolokia_metrics_broker_TotalTimeMs_FetchFollower_75thPercentileTotal time for fetch follower (75th percentile)Float64
jolokia_metrics_broker_TotalTimeMs_FetchFollower_MinMinimum total time for fetch followerUInt64
jolokia_metrics_broker_TotalTimeMs_FetchFollower_95thPercentileTotal time for fetch follower (95th percentile)Float64
jolokia_metrics_broker_IsrExpands_FiveMinuteRateISR expands five minute rateFloat64
jolokia_metrics_broker_IsrExpands_OneMinuteRateISR expands one minute rateFloat64
jolokia_metrics_broker_OfflinePartitionsCountCount of offline partitionsUInt64
jolokia_metrics_broker_RequestsPerSec_FetchConsumer_FifteenMinuteRateFetch consumer requests per second (15-minute rate)Float64
jolokia_metrics_broker_RequestsPerSec_FetchConsumer_FiveMinuteRateFetch consumer requests per second (5-minute rate)Float64
jolokia_metrics_broker_RequestsPerSec_FetchConsumer_MeanRateFetch consumer requests per second (mean rate)Float64
jolokia_metrics_broker_RequestsPerSec_FetchConsumer_OneMinuteRateFetch consumer requests per second (1-minute rate)Float64
jolokia_metrics_broker_RequestsPerSec_FetchFollower_FifteenMinuteRateFetch follower requests per second (15-minute rate)Float64
jolokia_metrics_broker_RequestsPerSec_FetchFollower_FiveMinuteRateFetch follower requests per second (5-minute rate)Float64
jolokia_metrics_broker_RequestsPerSec_FetchFollower_MeanRateFetch follower requests per second (mean rate)Float64
jolokia_metrics_broker_RequestsPerSec_FetchFollower_OneMinuteRateFetch follower requests per second (1-minute rate)Float64
jolokia_metrics_broker_RequestsPerSec_Produce_FifteenMinuteRateProduce requests per second (15-minute rate)Float64
jolokia_metrics_broker_RequestsPerSec_Produce_FiveMinuteRateProduce requests per second (5-minute rate)Float64
jolokia_metrics_broker_RequestsPerSec_Produce_MeanRateProduce requests per second (mean rate)Float64
jolokia_metrics_broker_RequestsPerSec_Produce_OneMinuteRateProduce requests per second (1-minute rate)Float64
jolokia_metrics_broker_LeaderElection_FifteenMinuteRateLeader election rate (15-minute rate)Float64
jolokia_metrics_broker_LeaderElection_FiveMinuteRateLeader election rate (5-minute rate)Float64
jolokia_metrics_broker_LeaderElection_MeanRateLeader election rate (mean rate)Float64
jolokia_metrics_broker_LeaderElection_OneMinuteRateLeader election rate (1-minute rate)Float64
jolokia_metrics_broker_UncleanLeaderElection_MeanRateUnclean leader election rate (mean rate)Float64
jolokia_metrics_broker_UncleanLeaderElection_OneMinuteRateUnclean leader election rate (1-minute rate)Float64
jolokia_metrics_broker_UncleanLeaderElection_FifteenMinuteRateUnclean leader election rate (15-minute rate)Float64
jolokia_metrics_broker_UncleanLeaderElection_FiveMinuteRateUnclean leader election rate (5-minute rate)Float64
jolokia_metrics_broker_topic_net_failed_fetch_request_per_secFailed fetch requests per second for topicFloat64
jolokia_metrics_broker_topic_net_failed_produce_request_per_secFailed produce requests per second for topicFloat64
jolokia_metrics_broker_topic_net_produce_request_per_secProduce requests per second for topicFloat64
jolokia_metrics_broker_UnderReplicatedPartitionsUnder-replicated partitions countUInt64
jolokia_metrics_KafkaServer_BrokerStateBroker stateUInt64
jolokia_metrics_Memory_HeapMemoryUsage_maxMax heap memory usageFloat64
jolokia_metrics_Memory_HeapMemoryUsage_usedUsed heap memoryFloat64
jolokia_metrics_Memory_HeapMemoryUsage_initInitial heap memoryFloat64
jolokia_metrics_Memory_HeapMemoryUsage_committedCommitted heap memoryFloat64
jolokia_metrics_Memory_NonHeapMemoryUsage_usedUsed non-heap memoryFloat64
jolokia_metrics_Memory_NonHeapMemoryUsage_initInitial non-heap memoryFloat64
jolokia_metrics_Memory_NonHeapMemoryUsage_committedCommitted non-heap memoryFloat64
jolokia_metrics_Memory_NonHeapMemoryUsage_maxMax non-heap memoryInt64
timestampDetailed timestampDateTime64
tenant_idTenant IDLowCardinality(String)
bu_idBusiness unit IDLowCardinality(String)
hostHostname of the serverLowCardinality(String)
targetTarget server or serviceLowCardinality(String)
sub_typeSubtype of the dataLowCardinality(String)
cluster_idCluster IDLowCardinality(String)
keeper_idKeeper IDLowCardinality(String)
service_node_nameName of the service nodeLowCardinality(String)
service_addressService addressLowCardinality(String)
zookeeper_mntr_server_stateZookeeper server stateLowCardinality(String)
zookeeper_mntr_num_alive_connectionsNumber of alive connections to ZookeeperUInt64
zookeeper_mntr_znode_countCount of znodes in ZookeeperUInt64
zookeeper_mntr_approximate_data_sizeApproximate data size in ZookeeperUInt64
zookeeper_mntr_max_file_descriptor_countMaximum number of file descriptors in ZookeeperUInt64
zookeeper_mntr_packets_receivedPackets received by ZookeeperUInt64
zookeeper_mntr_packets_sentPackets sent by ZookeeperUInt64
zookeeper_mntr_watch_countWatch count in ZookeeperUInt64
zookeeper_mntr_outstanding_requestsOutstanding requests in ZookeeperUInt64
zookeeper_mntr_open_file_descriptor_countOpen file descriptor count in ZookeeperUInt64
zookeeper_mntr_ephemerals_countCount of ephemeral nodes in ZookeeperUInt64
zookeeper_mntr_latency_minMinimum latency in ZookeeperUInt64
zookeeper_mntr_latency_maxMaximum latency in ZookeeperUInt64
zookeeper_mntr_latency_avgAverage latency in ZookeeperFloat64
zookeeper_server_modeMode of the Zookeeper serverLowCardinality(String)
zookeeper_server_outstandingOutstanding requests on the Zookeeper serverUInt64
zookeeper_server_countZookeeper server countUInt64
zookeeper_server_epochZookeeper server epochUInt64
zookeeper_server_receivedPackets received by Zookeeper serverUInt64
zookeeper_server_zxidZookeeper transaction ID (ZXID)String
zookeeper_server_node_countNode count on the Zookeeper serverUInt64
zookeeper_server_sentPackets sent by Zookeeper serverUInt64
zookeeper_server_connectionsConnections to Zookeeper serverUInt64
zookeeper_mntr_packets_received_diffDifference in received packets by Zookeeper (previous to current)Int64
zookeeper_mntr_packets_sent_diffDifference in sent packets by Zookeeper (previous to current)Int64
zookeeper_server_sent_diffDifference in packets sent by Zookeeper server (previous to current)Int64
zookeeper_server_received_diffDifference in packets received by Zookeeper server (previous to current)Int64