OpenTelemetry Docker Monitoring

OpenTelemetry Docker

Docker has gained popularity due to its ability to simplify application deployment, improve scalability, and increase development productivity. It has become a standard tool in many software development and deployment workflows.

By using the OpenTelemetry Docker Stats receiver, you can gather container-level metrics from Docker and integrate them into your observability infrastructure for monitoring and analysis purposes.

What is OpenTelemetry Collector?

OpenTelemetry Collectoropen in new window facilitates the collection, processing, and export of telemetry data from multiple sources. It acts as an intermediary between applications and observability backends, enabling unified data collection and export.

With OpenTelemetry Collector, you can centralize and standardize your telemetry data collection, apply data processing operations, and seamlessly export data to multiple OpenTelemetry backendsopen in new window. It supports a range of processors that can manipulate data, apply sampling strategies, and perform other data transformations based on your requirements.

OpenTelemetry Docker Stats

OpenTelemetry Docker Stats receiveropen in new window allows you to collect container-level resource metrics from Docker. It retrieves metrics such as CPU usage, memory usage, network statistics, and disk I/O from Docker containers and exposes them as OpenTelemetry metrics.

CPU metrics
MetricDescription
container.cpu.usage.systemSystem CPU usage, as reported by docker.
container.cpu.usage.totalTotal CPU time consumed.
container.cpu.usage.kernelmodeTime spent by tasks of the cgroup in kernel mode (Linux).
container.cpu.usage.usermodeTime spent by tasks of the cgroup in user mode (Linux).
container.cpu.usage.percpuPer-core CPU usage by the container.
container.cpu.throttling_data.periodsNumber of periods with throttling active.
container.cpu.throttling_data.throttled_periodsNumber of periods when the container hits its throttling limit.
container.cpu.throttling_data.throttled_timeAggregate time the container was throttled.
container.cpu.percentPercent of CPU used by the container.
Memory metrics
MetricDescription
container.memory.usage.limitMemory limit of the container.
container.memory.usage.totalMemory usage of the container. This excludes the cache.
container.memory.usage.maxMaximum memory usage.
container.memory.percentPercentage of memory used.
container.memory.cacheThe amount of memory used by the processes of this control group that can be associated precisely with a block on a block device.
container.memory.rssThe amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.
container.memory.rss_hugeNumber of bytes of anonymous transparent hugepages in this cgroup.
container.memory.dirtyBytes that are waiting to get written back to the disk, from this cgroup.
container.memory.writebackNumber of bytes of file/anon cache that are queued for syncing to disk in this cgroup.
container.memory.mapped_fileIndicates the amount of memory mapped by the processes in the control group.
container.memory.swapThe amount of swap currently used by the processes in this cgroup.
BlockIO metrics
MetricDescription
container.blockio.io_merged_recursiveNumber of bios/requests merged into requests belonging to this cgroup and its descendant cgroups.
container.blockio.io_queued_recursiveNumber of requests queued up for this cgroup and its descendant cgroups.
container.blockio.io_service_bytes_recursiveNumber of bytes transferred to/from the disk by the group and descendant groups.
container.blockio.io_service_time_recursiveTotal amount of time in nanoseconds between request dispatch and request completion for the IOs done by this cgroup and descendant cgroups.
container.blockio.io_serviced_recursiveNumber of IOs (bio) issued to the disk by the group and descendant groups.
container.blockio.io_time_recursiveDisk time allocated to cgroup (and descendant cgroups) per device in milliseconds.
container.blockio.io_wait_time_recursiveTotal amount of time the IOs for this cgroup (and descendant cgroups) spent waiting in the scheduler queues for service.
container.blockio.sectors_recursiveNumber of sectors transferred to/from disk by the group and descendant groups.
Network metrics
MetricDescription
container.network.io.usage.rx_bytesBytes received by the container.
container.network.io.usage.tx_bytesBytes sent.
container.network.io.usage.rx_droppedIncoming packets dropped.
container.network.io.usage.tx_droppedOutgoing packets dropped.
container.network.io.usage.rx_errorsReceived errors.
container.network.io.usage.tx_errorsSent errors.
container.network.io.usage.rx_packetsPackets received.
container.network.io.usage.tx_packetsPackets sent.

Usage

OpenTelemetry Docker Stats receiveropen in new window provides a convenient way to collect performance metrics from Docker containers, which can help you monitor the health and performance of your containerized applications.

To start monitoring Docker, you need to configure Docker Stats receiver in /etc/otel-contrib-collector/config.yaml using your Uptrace DSN:

receivers:
  docker_stats:
    endpoint: unix:///var/run/docker.sock
    collection_interval: 15s
    container_labels_to_metric_labels:
      my.container.label: my-metric-label
      my.other.container.label: my-other-metric-label
    env_vars_to_metric_labels:
      MY_ENVIRONMENT_VARIABLE: my-metric-label
      MY_OTHER_ENVIRONMENT_VARIABLE: my-other-metric-label
    excluded_images:
      - undesired-container
      - /.*undesired.*/
      - another-*-container
    metrics:
      container.cpu.usage.percpu:
        enabled: true
      container.network.io.usage.tx_dropped:
        enabled: false

exporters:
  otlp:
    endpoint: otlp.uptrace.dev:4317
    headers: { 'uptrace-dsn': '<FIXME>' }

processors:
  resourcedetection:
    detectors: [env, system]
  cumulativetodelta:
  batch:
    timeout: 10s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp, docker_stats]
      processors: [cumulativetodelta, batch, resourcedetection]
      exporters: [otlp]

Note that the config above uses /var/run/docker.sock Unix socket to communicate with Docker so you need to make sure the Otel Collector can access it.

Alternatively, you can configureopen in new window Docker daemon to also listen on 0.0.0.0:2375 and adjust the Otel Collector config accordingly:

receivers:
  docker_stats:
    endpoint: http://localhost:2375

See documentationopen in new window for more details.

OpenTelemetry Backend

Once the metrics are collected and exported, you can visualize them using a compatible backend system. For example, you can use Uptrace to create dashboards that display metrics from the OpenTelemetry Collector.

Uptrace is a Grafana alternativeopen in new window that supports distributed tracing, metrics, and logs. You can use it to monitor applications and troubleshoot issues.

Uptrace Overview

Uptrace comes with an intuitive query builder, rich dashboards, alerting rules with notifications, and integrations for most languages and frameworks.

Uptrace can process billions of spans and metrics on a single server and allows you to monitor your applications at 10x lower cost.

In just a few minutes, you can try Uptrace by visiting the cloud demoopen in new window (no login required) or running it locally with Dockeropen in new window. The source code is available on GitHubopen in new window.

What's next?

Next, you can learn more about configuring OpenTelemetry Collector. To start using OpenTelemetry and Uptrace, see Getting started with Uptrace.

Last Updated: 7/25/2024, 12:36:08 PM