OpenTelemetry Kafka Monitoring

August 25, 2024

2 min read

Apache Kafka is a widely used distributed streaming platform known for its high throughput, fault tolerance, and scalability.

Using the OpenTelemetry Collector Kafka receiver, you can collect telemetry data from Kafka applications and send it to your observability backend for analysis and visualization.

What is OpenTelemetry Collector?

You can deploy OpenTelemetry Collector as an agent that runs on individual hosts, where it periodically collects and forwards diagnostic information about the running system to various distributed tracing tools.

OpenTelemetry Collector provides powerful data processing capabilities. It can aggregate, filter, transform, and enrich telemetry data as it flows through the system.

With OpenTelemetr Collectory, you can collect telemetry data from your Kafka clusters and send it to the OpenTelemetry backend of your choice. This allows you to gain insight into the behavior and performance of your Kafka messaging system, monitor message processing times, track message flows, and analyze the overall health of your Kafka-based applications.

OpenTelemetry Kafka receiver

Monitoring Apache Kafka is critical to ensuring the health, performance, and reliability of your Kafka cluster.

Monitoring Kafka metrics helps identify performance bottlenecks, resource utilization issues, and potential inefficiencies within your Kafka cluster. By tracking metrics such as CPU usage, disk utilization, network traffic, and message rates, you can optimize your Kafka deployment to ensure optimal performance and scalability.

To start monitoring Kafka, you need to configure Kafka receiver in /etc/otel-contrib-collector/config.yaml using Uptrace DSN:

yaml

receivers:
  otlp:
    protocols:
      grpc:
      http:
  kafkametrics:
    brokers: localhost:9092
    protocol_version: 2.0.0
    scrapers:
      - brokers
      - topics
      - consumers

exporters:
  otlp/uptrace:
    endpoint: api.uptrace.dev:4317
    headers: { 'uptrace-dsn': '<FIXME>' }

processors:
  resourcedetection:
    detectors: [env, system]
  cumulativetodelta:
  batch:
    timeout: 10s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/uptrace]
    metrics:
      receivers: [otlp, mysql]
      processors: [cumulativetodelta, batch, resourcedetection]
      exporters: [otlp/uptrace]

Don't forget to restart OpenTelemetry Collector:

shell

sudo systemctl restart otelcol-contrib

You can also check OpenTelemetry Collector logs for any errors:

shell

sudo journalctl -u otelcol-contrib -f

OpenTelemetry Backend

Once the metrics are collected and exported, you can visualize them using a compatible backend system. For example, you can use Uptrace to create dashboards that display metrics from the OpenTelemetry Collector.

Uptrace is a DataDog alternative that supports distributed tracing, metrics, and logs. You can use it to monitor applications and troubleshoot issues.

Uptrace comes with an intuitive query builder, rich dashboards, alerting rules with notifications, and integrations for most languages and frameworks.

Uptrace can process billions of spans and metrics on a single server and allows you to monitor your applications at 10x lower cost.

In just a few minutes, you can try Uptrace by visiting the cloud demo (no login required) or running it locally with Docker. The source code is available on GitHub.

What's next?

By monitoring Kafka metrics, you can detect problems and anomalies early and take proactive measures before they escalate. By tracking metrics such as partition lag, replication lag, and consumer lag, you can identify and address potential bottlenecks, slow consumers, or replication delays.

Next, you can learn more about configuring OpenTelemetry Collector to export data to a backend.