OpenTelemetry Kubernetes Monitoring

OpenTelemetry Kubernetes

Kubernetes has become the de facto standard for container orchestration and is widely adopted by organizations of all sizes.

By using OpenTelemetry in conjunction with Kubernetes, you can gain deep insights into the performance and behavior of your applications, monitor their health and resource usage, and troubleshoot issues more effectively.

What is OpenTelemetry Collector?

OpenTelemetry Collectoropen in new window is an agent that pulls telemetry data from systems you want to monitor and export the collected data to an OpenTelemetry backendopen in new window.

Otel Collector provides powerful data processing capabilities, allowing you to perform aggregation, filtering, sampling, and enrichment of telemetry data. You can transform and reshape the data to fit your specific monitoring and analysis requirements before sending it to the backend systems.

Authentication

Otel Collector works by using the Kubernetes API to query and monitor the state of various Kubernetes resources.

To use the Kubernetes API, you need to configure an authentication method, for example, using a service account. This means that the Otel Collector must be running on the same K8s cluster, and the service account must have sufficient permissions to use the API.

Monitoring Kubernetes Cluster

OpenTelemetry Kubernetes Cluster receiveropen in new window allows you to collect observability data from your Kubernetes cluster. It captures telemetry data about the cluster's nodes, pods, containers, and other resources, providing insights into the cluster's health, performance, and resource utilization.

To start monitoring your Kubernetes cluster, you need to configure the receiver in /etc/otel-contrib-collector/config.yaml using your Uptrace DSN:

receivers:
  k8s_cluster:
    auth_type: serviceAccount

exporters:
  otlp:
    endpoint: otlp.uptrace.dev:4317
    headers: { 'uptrace-dsn': '<FIXME>' }

processors:
  resourcedetection:
    detectors: [env, system]
  cumulativetodelta:
  batch:
    timeout: 10s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp, k8s_cluster]
      processors: [cumulativetodelta, batch, resourcedetection]
      exporters: [otlp]

Don't forget to create a service accountopen in new window with sufficient permissions.

See Helm exampleopen in new window and Otelcol documentationopen in new window for more details.

Monitoring Kubernetes Pods

OpenTelemetry Collector also provides OpenTelemetry Kubelet Stats receiveropen in new window for collecting metrics from Kubelet, which is the primary node agent that runs on each node in a Kubernetes cluster.

Although it's possible to use kubernetes' hostNetwork feature to talk to the Kubelet api from a pod, the preferred approach is to use the downward API and a service account.

Make sure the pod spec sets the node name as follows:

env:
  - name: K8S_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

Then the otel config can reference the K8S_NODE_NAME environment variable:

receivers:
  kubeletstats:
    auth_type: 'serviceAccount'
    endpoint: 'https://${env:K8S_NODE_NAME}:10250'
    insecure_skip_verify: true

exporters:
  otlp:
    endpoint: otlp.uptrace.dev:4317
    headers: { 'uptrace-dsn': '<FIXME>' }

processors:
  resourcedetection:
    detectors: [env, system]
  cumulativetodelta:
  batch:
    timeout: 10s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp, kubeletstats]
      processors: [cumulativetodelta, batch, resourcedetection]
      exporters: [otlp]

Don't forget to create a service accountopen in new window with sufficient permissions.

See Helm exampleopen in new window and Otelcol documentationopen in new window for more details.

OpenTelemetry Backend

Once the metrics are collected and exported, you can visualize them using a compatible backend system. For example, you can use Uptrace to create dashboards that display metrics from the OpenTelemetry Collector.

Uptrace is a DataDog competitoropen in new window that supports distributed tracing, metrics, and logs. You can use it to monitor applications and troubleshoot issues.

Uptrace Overview

Uptrace comes with an intuitive query builder, rich dashboards, alerting rules with notifications, and integrations for most languages and frameworks.

Uptrace can process billions of spans and metrics on a single server and allows you to monitor your applications at 10x lower cost.

In just a few minutes, you can try Uptrace by visiting the cloud demoopen in new window (no login required) or running it locally with Dockeropen in new window. The source code is available on GitHubopen in new window.

What's next?

Next, you can learn more about configuring OpenTelemetry Collector. To start using OpenTelemetry and Uptrace, see Getting started with Uptrace.

Last Updated: 7/25/2024, 12:36:08 PM