Monitor Kafka with OpenTelemetry
Apache Kafka is a widely used distributed streaming platform known for its high throughput, fault tolerance, and scalability.
Using the OpenTelemetry Collector Kafka receiver, you can collect telemetry data from Kafka applications and send it to your observability backend for analysis and visualization.
What is OpenTelemetry Collector?
You can deploy OpenTelemetry Collector as an agent that runs on individual hosts, where it periodically collects and forwards diagnostic information about the running system to various distributed tracing tools.
OpenTelemetry Collector provides powerful data processing capabilities. It can aggregate, filter, transform, and enrich telemetry data as it flows through the system.
With OpenTelemetr Collectory, you can collect telemetry data from your Kafka clusters and send it to the OpenTelemetry backend of your choice. This allows you to gain insight into the behavior and performance of your Kafka messaging system, monitor message processing times, track message flows, and analyze the overall health of your Kafka-based applications.
OpenTelemetry Kafka receiver
Monitoring Apache Kafka is critical to ensuring the health, performance, and reliability of your Kafka cluster.
Monitoring Kafka metrics helps identify performance bottlenecks, resource utilization issues, and potential inefficiencies within your Kafka cluster. By tracking metrics such as CPU usage, disk utilization, network traffic, and message rates, you can optimize your Kafka deployment to ensure optimal performance and scalability.
To start monitoring Kafka, you need to configure Kafka receiver in /etc/otel-contrib-collector/config.yaml
using Uptrace DSN:
receivers:
otlp:
protocols:
grpc:
http:
kafkametrics:
brokers: localhost:9092
protocol_version: 2.0.0
scrapers:
- brokers
- topics
- consumers
exporters:
otlp/uptrace:
endpoint: otlp.uptrace.dev:4317
headers: { 'uptrace-dsn': '<FIXME>' }
processors:
resourcedetection:
detectors: [env, system]
cumulativetodelta:
batch:
timeout: 10s
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/uptrace]
metrics:
receivers: [otlp, mysql]
processors: [cumulativetodelta, batch, resourcedetection]
exporters: [otlp/uptrace]
Don't forget to restart OpenTelemetry Collector:
sudo systemctl restart otelcol-contrib
You can also check OpenTelemetry Collector logs for any errors:
sudo journalctl -u otelcol-contrib -f
OpenTelemetry Backend
Once the metrics are collected and exported, you can visualize them using a compatible backend system. For example, you can use Uptrace to create dashboards that display metrics from the OpenTelemetry Collector.
Uptrace is a DataDog alternative that supports distributed tracing, metrics, and logs. You can use it to monitor applications and troubleshoot issues.
Uptrace comes with an intuitive query builder, rich dashboards, alerting rules with notifications, and integrations for most languages and frameworks.
Uptrace can process billions of spans and metrics on a single server and allows you to monitor your applications at 10x lower cost.
In just a few minutes, you can try Uptrace by visiting the cloud demo (no login required) or running it locally with Docker. The source code is available on GitHub.
What's next?
By monitoring Kafka metrics, you can detect problems and anomalies early and take proactive measures before they escalate. By tracking metrics such as partition lag, replication lag, and consumer lag, you can identify and address potential bottlenecks, slow consumers, or replication delays.
Next, you can learn more about configuring OpenTelemetry Collector. To start using OpenTelemetry and Uptrace, see Getting started with Uptrace.