OpenTelemetry Docker Monitoring

March 01, 2024

3 min read

Docker has gained popularity due to its ability to simplify application deployment, improve scalability, and increase development productivity. It has become a standard tool in many software development and deployment workflows.

By using the OpenTelemetry Docker Stats receiver, you can gather container-level metrics from Docker and integrate them into your observability infrastructure for monitoring and analysis purposes.

What is OpenTelemetry Collector?

OpenTelemetry Collector facilitates the collection, processing, and export of telemetry data from multiple sources. It acts as an intermediary between applications and observability backends, enabling unified data collection and export.

With OpenTelemetry Collector, you can centralize and standardize your telemetry data collection, apply data processing operations, and seamlessly export data to multiple OpenTelemetry APMs. It supports a range of processors that can manipulate data, apply sampling strategies, and perform other data transformations based on your requirements.

OpenTelemetry Docker Stats

OpenTelemetry Docker Stats receiver allows you to collect container-level resource metrics from Docker. It retrieves metrics such as CPU usage, memory usage, network statistics, and disk I/O from Docker containers and exposes them as OpenTelemetry metrics.

::: details CPU metrics

Metric	Description
container.cpu.usage.system	System CPU usage, as reported by docker.
container.cpu.usage.total	Total CPU time consumed.
container.cpu.usage.kernelmode	Time spent by tasks of the cgroup in kernel mode (Linux).
container.cpu.usage.usermode	Time spent by tasks of the cgroup in user mode (Linux).
container.cpu.usage.percpu	Per-core CPU usage by the container.
container.cpu.throttling_data.periods	Number of periods with throttling active.
container.cpu.throttling_data.throttled_periods	Number of periods when the container hits its throttling limit.
container.cpu.throttling_data.throttled_time	Aggregate time the container was throttled.
container.cpu.percent	Percent of CPU used by the container.

:::

::: details Memory metrics

Metric	Description
container.memory.usage.limit	Memory limit of the container.
container.memory.usage.total	Memory usage of the container. This excludes the cache.
container.memory.usage.max	Maximum memory usage.
container.memory.percent	Percentage of memory used.
container.memory.cache	The amount of memory used by the processes of this control group that can be associated precisely with a block on a block device.
container.memory.rss	The amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.
container.memory.rss_huge	Number of bytes of anonymous transparent hugepages in this cgroup.
container.memory.dirty	Bytes that are waiting to get written back to the disk, from this cgroup.
container.memory.writeback	Number of bytes of file/anon cache that are queued for syncing to disk in this cgroup.
container.memory.mapped_file	Indicates the amount of memory mapped by the processes in the control group.
container.memory.swap	The amount of swap currently used by the processes in this cgroup.

:::

::: details BlockIO metrics

Metric	Description
container.blockio.io_merged_recursive	Number of bios/requests merged into requests belonging to this cgroup and its descendant cgroups.
container.blockio.io_queued_recursive	Number of requests queued up for this cgroup and its descendant cgroups.
container.blockio.io_service_bytes_recursive	Number of bytes transferred to/from the disk by the group and descendant groups.
container.blockio.io_service_time_recursive	Total amount of time in nanoseconds between request dispatch and request completion for the IOs done by this cgroup and descendant cgroups.
container.blockio.io_serviced_recursive	Number of IOs (bio) issued to the disk by the group and descendant groups.
container.blockio.io_time_recursive	Disk time allocated to cgroup (and descendant cgroups) per device in milliseconds.
container.blockio.io_wait_time_recursive	Total amount of time the IOs for this cgroup (and descendant cgroups) spent waiting in the scheduler queues for service.
container.blockio.sectors_recursive	Number of sectors transferred to/from disk by the group and descendant groups.

:::

::: details Network metrics

Metric	Description
container.network.io.usage.rx_bytes	Bytes received by the container.
container.network.io.usage.tx_bytes	Bytes sent.
container.network.io.usage.rx_dropped	Incoming packets dropped.
container.network.io.usage.tx_dropped	Outgoing packets dropped.
container.network.io.usage.rx_errors	Received errors.
container.network.io.usage.tx_errors	Sent errors.
container.network.io.usage.rx_packets	Packets received.
container.network.io.usage.tx_packets	Packets sent.

:::

Usage

OpenTelemetry Docker Stats receiver provides a convenient way to collect performance metrics from Docker containers, which can help you monitor the health and performance of your containerized applications.

To start monitoring Docker, you need to configure Docker Stats receiver in /etc/otel-contrib-collector/config.yaml using your Uptrace DSN:

yaml

receivers:
  docker_stats:
    endpoint: unix:///var/run/docker.sock
    collection_interval: 15s
    container_labels_to_metric_labels:
      my.container.label: my-metric-label
      my.other.container.label: my-other-metric-label
    env_vars_to_metric_labels:
      MY_ENVIRONMENT_VARIABLE: my-metric-label
      MY_OTHER_ENVIRONMENT_VARIABLE: my-other-metric-label
    excluded_images:
      - undesired-container
      - /.*undesired.*/
      - another-*-container
    metrics:
      container.cpu.usage.percpu:
        enabled: true
      container.network.io.usage.tx_dropped:
        enabled: false

exporters:
  otlp:
    endpoint: api.uptrace.dev:4317
    headers: { 'uptrace-dsn': '<FIXME>' }

processors:
  resourcedetection:
    detectors: [env, system]
  cumulativetodelta:
  batch:
    timeout: 10s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp, docker_stats]
      processors: [cumulativetodelta, batch, resourcedetection]
      exporters: [otlp]

Note that the config above uses /var/run/docker.sock Unix socket to communicate with Docker so you need to make sure the Otel Collector can access it.

Alternatively, you can configure Docker daemon to also listen on 0.0.0.0:2375 and adjust the Otel Collector config accordingly:

yaml

receivers:
  docker_stats:
    endpoint: http://localhost:2375

See documentation for more details.

OpenTelemetry Backend

Once the metrics are collected and exported, you can visualize them using a compatible backend system. For example, you can use Uptrace to create dashboards that display metrics from the OpenTelemetry Collector.

Uptrace is a Grafana alternative that supports distributed tracing, metrics, and logs. You can use it to monitor applications and troubleshoot issues.

Uptrace comes with an intuitive query builder, rich dashboards, alerting rules with notifications, and integrations for most languages and frameworks.

Uptrace can process billions of spans and metrics on a single server and allows you to monitor your applications at 10x lower cost.

In just a few minutes, you can try Uptrace by visiting the cloud demo (no login required) or running it locally with Docker. The source code is available on GitHub.

What's next?

Next, you can learn more about configuring OpenTelemetry Collector to export data to a backend.