Monitor Kafka with OpenTelemetry

OpenTelemetry Kafka

Apache Kafka is a widely used distributed streaming platform known for its high throughput, fault tolerance, and scalability.

Using the OpenTelemetry Collector Kafka receiver, you can collect telemetry data from Kafka applications and send it to your observability backend for analysis and visualization.

What is OpenTelemetry Collector?

You can deploy OpenTelemetry Collectoropen in new window as an agent that runs on individual hosts, where it periodically collects and forwards diagnostic information about the running system to various distributed tracing toolsopen in new window.

OpenTelemetry Collector provides powerful data processing capabilities. It can aggregate, filter, transform, and enrich telemetry data as it flows through the system.

With OpenTelemetr Collectory, you can collect telemetry data from your Kafka clusters and send it to the OpenTelemetry backendopen in new window of your choice. This allows you to gain insight into the behavior and performance of your Kafka messaging system, monitor message processing times, track message flows, and analyze the overall health of your Kafka-based applications.

OpenTelemetry Kafka receiver

Monitoring Apache Kafka is critical to ensuring the health, performance, and reliability of your Kafka cluster.

Monitoring Kafka metrics helps identify performance bottlenecks, resource utilization issues, and potential inefficiencies within your Kafka cluster. By tracking metrics such as CPU usage, disk utilization, network traffic, and message rates, you can optimize your Kafka deployment to ensure optimal performance and scalability.

To start monitoring Kafka, you need to configure Kafka receiveropen in new window in /etc/otel-contrib-collector/config.yaml using Uptrace DSN:

receivers:
  otlp:
    protocols:
      grpc:
      http:
  kafkametrics:
    brokers: localhost:9092
    protocol_version: 2.0.0
    scrapers:
      - brokers
      - topics
      - consumers

exporters:
  otlp/uptrace:
    endpoint: otlp.uptrace.dev:4317
    headers: { 'uptrace-dsn': '<FIXME>' }

processors:
  resourcedetection:
    detectors: [env, system]
  cumulativetodelta:
  batch:
    timeout: 10s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/uptrace]
    metrics:
      receivers: [otlp, mysql]
      processors: [cumulativetodelta, batch, resourcedetection]
      exporters: [otlp/uptrace]

Don't forget to restart OpenTelemetry Collector:

sudo systemctl restart otelcol-contrib

You can also check OpenTelemetry Collector logs for any errors:

sudo journalctl -u otelcol-contrib -f

OpenTelemetry Backend

Once the metrics are collected and exported, you can visualize them using a compatible backend system. For example, you can use Uptrace to create dashboards that display metrics from the OpenTelemetry Collector.

Uptrace is a DataDog alternativeopen in new window that supports distributed tracing, metrics, and logs. You can use it to monitor applications and troubleshoot issues.

Uptrace Overview

Uptrace comes with an intuitive query builder, rich dashboards, alerting rules with notifications, and integrations for most languages and frameworks.

Uptrace can process billions of spans and metrics on a single server and allows you to monitor your applications at 10x lower cost.

In just a few minutes, you can try Uptrace by visiting the cloud demoopen in new window (no login required) or running it locally with Dockeropen in new window. The source code is available on GitHubopen in new window.

What's next?

By monitoring Kafka metrics, you can detect problems and anomalies early and take proactive measures before they escalate. By tracking metrics such as partition lag, replication lag, and consumer lag, you can identify and address potential bottlenecks, slow consumers, or replication delays.

Next, you can learn more about configuring OpenTelemetry Collector. To start using OpenTelemetry and Uptrace, see Getting started with Uptrace.

Last Updated: 7/25/2024, 12:36:08 PM