OpenTelemetry Docker Monitoring

Vladimir Mihailenco
March 01, 2024
3 min read

Docker has gained popularity due to its ability to simplify application deployment, improve scalability, and increase development productivity. It has become a standard tool in many software development and deployment workflows.

By using the OpenTelemetry Docker Stats receiver, you can gather container-level metrics from Docker and integrate them into your observability infrastructure for monitoring and analysis purposes.

What is OpenTelemetry Collector?

OpenTelemetry Collector facilitates the collection, processing, and export of telemetry data from multiple sources. It acts as an intermediary between applications and observability backends, enabling unified data collection and export.

With OpenTelemetry Collector, you can centralize and standardize your telemetry data collection, apply data processing operations, and seamlessly export data to multiple OpenTelemetry APMs. It supports a range of processors that can manipulate data, apply sampling strategies, and perform other data transformations based on your requirements.

OpenTelemetry Docker Stats

OpenTelemetry Docker Stats receiver allows you to collect container-level resource metrics from Docker. It retrieves metrics such as CPU usage, memory usage, network statistics, and disk I/O from Docker containers and exposes them as OpenTelemetry metrics.

::: details CPU metrics

MetricDescription
container.cpu.usage.systemSystem CPU usage, as reported by docker.
container.cpu.usage.totalTotal CPU time consumed.
container.cpu.usage.kernelmodeTime spent by tasks of the cgroup in kernel mode (Linux).
container.cpu.usage.usermodeTime spent by tasks of the cgroup in user mode (Linux).
container.cpu.usage.percpuPer-core CPU usage by the container.
container.cpu.throttling_data.periodsNumber of periods with throttling active.
container.cpu.throttling_data.throttled_periodsNumber of periods when the container hits its throttling limit.
container.cpu.throttling_data.throttled_timeAggregate time the container was throttled.
container.cpu.percentPercent of CPU used by the container.

:::

::: details Memory metrics

MetricDescription
container.memory.usage.limitMemory limit of the container.
container.memory.usage.totalMemory usage of the container. This excludes the cache.
container.memory.usage.maxMaximum memory usage.
container.memory.percentPercentage of memory used.
container.memory.cacheThe amount of memory used by the processes of this control group that can be associated precisely with a block on a block device.
container.memory.rssThe amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.
container.memory.rss_hugeNumber of bytes of anonymous transparent hugepages in this cgroup.
container.memory.dirtyBytes that are waiting to get written back to the disk, from this cgroup.
container.memory.writebackNumber of bytes of file/anon cache that are queued for syncing to disk in this cgroup.
container.memory.mapped_fileIndicates the amount of memory mapped by the processes in the control group.
container.memory.swapThe amount of swap currently used by the processes in this cgroup.

:::

::: details BlockIO metrics

MetricDescription
container.blockio.io_merged_recursiveNumber of bios/requests merged into requests belonging to this cgroup and its descendant cgroups.
container.blockio.io_queued_recursiveNumber of requests queued up for this cgroup and its descendant cgroups.
container.blockio.io_service_bytes_recursiveNumber of bytes transferred to/from the disk by the group and descendant groups.
container.blockio.io_service_time_recursiveTotal amount of time in nanoseconds between request dispatch and request completion for the IOs done by this cgroup and descendant cgroups.
container.blockio.io_serviced_recursiveNumber of IOs (bio) issued to the disk by the group and descendant groups.
container.blockio.io_time_recursiveDisk time allocated to cgroup (and descendant cgroups) per device in milliseconds.
container.blockio.io_wait_time_recursiveTotal amount of time the IOs for this cgroup (and descendant cgroups) spent waiting in the scheduler queues for service.
container.blockio.sectors_recursiveNumber of sectors transferred to/from disk by the group and descendant groups.

:::

::: details Network metrics

MetricDescription
container.network.io.usage.rx_bytesBytes received by the container.
container.network.io.usage.tx_bytesBytes sent.
container.network.io.usage.rx_droppedIncoming packets dropped.
container.network.io.usage.tx_droppedOutgoing packets dropped.
container.network.io.usage.rx_errorsReceived errors.
container.network.io.usage.tx_errorsSent errors.
container.network.io.usage.rx_packetsPackets received.
container.network.io.usage.tx_packetsPackets sent.

:::

Usage

OpenTelemetry Docker Stats receiver provides a convenient way to collect performance metrics from Docker containers, which can help you monitor the health and performance of your containerized applications.

To start monitoring Docker, you need to configure Docker Stats receiver in /etc/otel-contrib-collector/config.yaml using your Uptrace DSN:

yaml
receivers:
  docker_stats:
    endpoint: unix:///var/run/docker.sock
    collection_interval: 15s
    container_labels_to_metric_labels:
      my.container.label: my-metric-label
      my.other.container.label: my-other-metric-label
    env_vars_to_metric_labels:
      MY_ENVIRONMENT_VARIABLE: my-metric-label
      MY_OTHER_ENVIRONMENT_VARIABLE: my-other-metric-label
    excluded_images:
      - undesired-container
      - /.*undesired.*/
      - another-*-container
    metrics:
      container.cpu.usage.percpu:
        enabled: true
      container.network.io.usage.tx_dropped:
        enabled: false

exporters:
  otlp:
    endpoint: api.uptrace.dev:4317
    headers: { 'uptrace-dsn': '<FIXME>' }

processors:
  resourcedetection:
    detectors: [env, system]
  cumulativetodelta:
  batch:
    timeout: 10s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp, docker_stats]
      processors: [cumulativetodelta, batch, resourcedetection]
      exporters: [otlp]

Note that the config above uses /var/run/docker.sock Unix socket to communicate with Docker so you need to make sure the Otel Collector can access it.

Alternatively, you can configure Docker daemon to also listen on 0.0.0.0:2375 and adjust the Otel Collector config accordingly:

yaml
receivers:
  docker_stats:
    endpoint: http://localhost:2375

See documentation for more details.

OpenTelemetry Backend

Once the metrics are collected and exported, you can visualize them using a compatible backend system. For example, you can use Uptrace to create dashboards that display metrics from the OpenTelemetry Collector.

Uptrace is a Grafana alternative that supports distributed tracing, metrics, and logs. You can use it to monitor applications and troubleshoot issues.

Uptrace comes with an intuitive query builder, rich dashboards, alerting rules with notifications, and integrations for most languages and frameworks.

Uptrace can process billions of spans and metrics on a single server and allows you to monitor your applications at 10x lower cost.

In just a few minutes, you can try Uptrace by visiting the cloud demo (no login required) or running it locally with Docker. The source code is available on GitHub.

What's next?

Next, you can learn more about configuring OpenTelemetry Collector to export data to a backend.