OpenTelemetry Host Metrics receiver
The hostmetricsreceiver is an OpenTelemetry Collector plugin that collects comprehensive system-level metrics from host machines. It provides critical insights into system performance, resource utilization, and potential bottlenecks that could impact your applications.
Overview
By collecting and analyzing host metrics, you can:
- Monitor system health in real-time across your infrastructure
- Identify performance bottlenecks before they impact users
- Track resource utilization trends for capacity planning
- Set up alerts for anomalous behavior or resource exhaustion
- Correlate system metrics with application performance issues
Prerequisites
Before you begin, ensure you have:
- OpenTelemetry Collector installed (installation guide)
- Appropriate permissions to collect system metrics
- Access to modify the Collector configuration
- A telemetry backend configured (e.g., Uptrace, Prometheus, Jaeger)
What is OpenTelemetry Collector?
OpenTelemetry Collector is a vendor-agnostic agent that:
- Collects telemetry data from various sources using receivers
- Processes data through a pipeline (filtering, aggregating, transforming)
- Exports data to one or multiple observability backends
The Collector provides a unified way to receive, process, and export telemetry data (traces, metrics, and logs), eliminating the need for multiple agents.
Key Benefits
- Simplified deployment - Single agent for all telemetry data
- Vendor neutrality - Switch backends without changing instrumentation
- Powerful processing - Transform and enrich data before export
- Reduced overhead - Efficient batching and compression
Quick Start Guide
Basic Configuration
To start collecting host metrics immediately, add this minimal configuration to your otel-collector-config.yaml:
receivers:
hostmetrics:
collection_interval: 10s
scrapers:
cpu:
memory:
disk:
filesystem:
network:
load:
processors:
batch:
timeout: 10s
exporters:
# Configure your exporter here
otlp:
endpoint: "your-backend:4317"
service:
pipelines:
metrics:
receivers: [hostmetrics]
processors: [batch]
exporters: [otlp]
Starting the Collector
# Run with your configuration
otelcol-contrib --config=otel-collector-config.yaml
# Or as a service (systemd)
sudo systemctl start otelcol-contrib
Complete Configuration Example
Here's a comprehensive configuration that includes all available scrapers and recommended processors:
processors:
# Detect and add resource attributes
resourcedetection:
detectors: [env, system, docker, ec2, gcp, azure]
timeout: 5s
override: false
# Convert cumulative metrics to delta
cumulativetodelta:
# Batch data for efficient export
batch:
timeout: 10s
send_batch_size: 1024
receivers:
hostmetrics:
# How often to collect metrics
collection_interval: 10s
# Configure individual scrapers
scrapers:
# CPU utilization metrics
cpu:
metrics:
system.cpu.utilization:
enabled: true
# Disk I/O metrics
disk:
# Exclude specific mount points
exclude:
devices: ["^/dev/loop.*"]
# File System utilization metrics
filesystem:
exclude:
mount_points: ["/tmp/*", "/dev/shm"]
include_fs_types:
match_type: strict
fs_types: [ext3, ext4, xfs, btrfs]
# CPU load average metrics (1, 5, 15 minutes)
load:
# Memory utilization metrics
memory:
# Network interface I/O metrics & TCP connection metrics
network:
include:
interfaces: ["eth0", "eth1"]
# Paging/Swap space utilization and I/O metrics
paging:
# Process count by status
process:
# Per-process metrics (requires elevated permissions)
processes:
include:
names: ["nginx", "postgres", "redis"]
mute_process_name_error: true
service:
pipelines:
metrics:
receivers: [hostmetrics]
processors: [resourcedetection, cumulativetodelta, batch]
exporters: [otlp]
Available Scrapers
The hostmetricsreceiver includes multiple scrapers, each collecting specific metric types:
CPU Scraper
Collects CPU utilization metrics per core and aggregated.
Metrics collected:
system.cpu.time- CPU time in different states (user, system, idle, etc.)system.cpu.utilization- CPU utilization percentage
Example configuration:
cpu:
metrics:
system.cpu.utilization:
enabled: true
system.cpu.logical.count:
enabled: true
Memory Scraper
Monitors RAM usage and availability.
Metrics collected:
system.memory.usage- Memory usage by statesystem.memory.utilization- Memory utilization percentage
Example configuration:
memory:
metrics:
system.memory.utilization:
enabled: true
Disk Scraper
Tracks disk I/O operations and performance.
Metrics collected:
system.disk.io- Bytes read/writtensystem.disk.operations- Read/write operations countsystem.disk.io_time- Time spent on I/O operationssystem.disk.merged- Merged read/write operations
Example configuration:
disk:
# Collect metrics for specific devices only
include:
devices: ["sda", "sdb"]
exclude:
devices: ["^/dev/loop.*"]
Filesystem Scraper
Monitors filesystem usage and capacity.
Metrics collected:
system.filesystem.usage- Used/free spacesystem.filesystem.utilization- Usage percentagesystem.filesystem.inodes.usage- Inode usage
Example configuration:
filesystem:
include_fs_types:
match_type: regexp
fs_types: ["^ext[234]$", "^xfs$"]
exclude:
mount_points: ["/boot", "/dev/*"]
include_virtual_filesystems: false
Network Scraper
Collects network interface and connection statistics.
Metrics collected:
system.network.io- Bytes sent/receivedsystem.network.packets- Packets sent/receivedsystem.network.errors- Network errorssystem.network.connections- TCP connection states
Example configuration:
network:
include:
interfaces: ["eth.*", "en.*"]
exclude:
interfaces: ["lo", "docker.*"]
Load Scraper
Reports system load averages (Linux/Unix only).
Metrics collected:
system.cpu.load_average.1m- 1-minute load averagesystem.cpu.load_average.5m- 5-minute load averagesystem.cpu.load_average.15m- 15-minute load average
Paging Scraper
Monitors swap/paging activity.
Metrics collected:
system.paging.usage- Swap space usagesystem.paging.operations- Page in/out operationssystem.paging.faults- Page fault counts
Process Scrapers
Two scrapers provide process-level insights:
Process Scraper - Aggregate process counts:
system.processes.running- Running processessystem.processes.blocked- Blocked processes
Processes Scraper - Per-process metrics:
process.cpu.time- CPU time per processprocess.memory.usage- Memory usage per processprocess.disk.io- Disk I/O per process
Advanced Configuration
Filesystem Metrics Configuration
For production environments with diverse filesystem types:
receivers:
hostmetrics:
collection_interval: 10s
scrapers:
filesystem:
# Only collect from specific filesystem types
include_fs_types:
match_type: strict
fs_types: [ext3, ext4, xfs, btrfs, zfs]
# Exclude temporary and virtual filesystems
exclude:
mount_points:
- "/tmp/*"
- "/dev/shm"
- "/run/*"
- "/sys/*"
fs_types: [tmpfs, devtmpfs, autofs]
# Include metrics for virtual filesystems
include_virtual_filesystems: false
Process Metrics Configuration
To collect detailed per-process metrics:
receivers:
hostmetrics:
collection_interval: 10s
scrapers:
# Aggregate process counts
process:
metrics:
system.processes.created:
enabled: true
# Detailed per-process metrics
processes:
# Include specific processes by name
include:
names: ["nginx", "mysql", "redis", "postgres"]
match_type: regexp
# Or exclude specific processes
exclude:
names: ["^kernel.*"]
# Mute errors for processes that terminate during collection
mute_process_name_error: true
# Limit number of processes to track
max_processes: 256
Permission Requirements
Process metrics require elevated permissions. Configure based on your system:
Linux (systemd):
# /lib/systemd/system/otelcol-contrib.service
[Service]
User=root
Group=root
# Or use capabilities instead of root
AmbientCapabilities=CAP_SYS_PTRACE CAP_DAC_READ_SEARCH
Docker:
docker run --pid=host --cap-add=SYS_PTRACE ...
Kubernetes:
apiVersion: v1
kind: Pod
spec:
containers:
- name: otel-collector
securityContext:
capabilities:
add: ["SYS_PTRACE", "DAC_READ_SEARCH"]
Container Deployments
Docker Configuration
When running the Collector in a container, mount host directories to collect host (not container) metrics:
# Full host filesystem access
docker run \
-v /:/hostfs:ro \
-v ./config.yaml:/etc/otel-collector-config.yaml \
otel/opentelemetry-collector-contrib:latest \
--config=/etc/otel-collector-config.yaml
# Minimal required mounts
docker run \
-v /proc:/hostfs/proc:ro \
-v /sys:/hostfs/sys:ro \
-v /etc/hostname:/hostfs/etc/hostname:ro \
-v ./config.yaml:/etc/otel-collector-config.yaml \
otel/opentelemetry-collector-contrib:latest
Configuration with root_path:
receivers:
hostmetrics:
root_path: /hostfs
collection_interval: 10s
scrapers:
cpu:
memory:
disk:
filesystem:
network:
Kubernetes DaemonSet
Deploy as a DaemonSet to collect metrics from all nodes:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
spec:
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
serviceAccountName: otel-collector
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:latest
volumeMounts:
- name: hostfs
mountPath: /hostfs
readOnly: true
- name: config
mountPath: /etc/otel-collector-config.yaml
subPath: config.yaml
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumes:
- name: hostfs
hostPath:
path: /
- name: config
configMap:
name: otel-collector-config
hostNetwork: true
hostPID: true
Resource Detection
The Resource Detection Processor automatically discovers and adds metadata about the environment:
processors:
resourcedetection:
# Detectors to run
detectors: [env, system, docker, ec2, gcp, azure, aks, eks, gke]
# Detection timeout
timeout: 5s
# Override existing attributes
override: false
# Configure specific detectors
system:
hostname_sources: [dns, os, cname]
ec2:
tags:
- "^env$"
- "^team$"
docker:
use_hostname_if_available: true
Available Detectors
- env - Environment variables
- system - System information (hostname, OS, architecture)
- docker - Docker container metadata
- ec2 - AWS EC2 instance metadata
- ecs - AWS ECS task metadata
- eks - AWS EKS cluster information
- gcp - Google Cloud Platform metadata
- gke - Google Kubernetes Engine metadata
- azure - Azure VM metadata
- aks - Azure Kubernetes Service metadata
- kubernetes - Kubernetes pod/node metadata
Hostname Configuration
Control how hostnames are determined:
processors:
resourcedetection/custom:
detectors: [system]
system:
# Sources: dns, os, cname, lookup
hostname_sources: [dns, os]
# Use FQDN
use_fqdn: true
Performance Optimization
Collection Interval Tuning
Balance between data granularity and resource usage:
receivers:
hostmetrics:
# High-frequency for critical metrics
collection_interval: 5s
scrapers:
cpu:
memory:
hostmetrics/detailed:
# Lower frequency for detailed metrics
collection_interval: 30s
scrapers:
filesystem:
processes:
Metric Filtering
Reduce data volume by filtering unnecessary metrics:
processors:
filter/metrics:
metrics:
exclude:
match_type: regexp
metric_names:
- "system\\.network\\.io.*"
- "system\\.disk\\.merged"
Batching Configuration
Optimize export efficiency:
processors:
batch:
# Maximum time before sending
timeout: 10s
# Maximum batch size
send_batch_size: 1024
# Maximum queue size
send_batch_max_size: 2048
Troubleshooting
Common Issues and Solutions
1. Missing Metrics
Problem: Some expected metrics aren't being collected.
Solutions:
- Check if the scraper is enabled in configuration
- Verify permissions (especially for process metrics)
- Ensure filesystem/device isn't excluded
- Check collector logs for errors
2. High CPU Usage
Problem: Collector consuming excessive CPU.
Solutions:
- Increase collection interval
- Reduce number of processes tracked
- Filter out unnecessary metrics
- Enable metric aggregation
3. Permission Denied Errors
Problem: Cannot collect process or certain system metrics.
Solutions:
# Check current permissions
ps aux | grep otelcol
# Run with elevated permissions
sudo systemctl edit otelcol-contrib
# Add: User=root
# Or add specific capabilities
setcap cap_sys_ptrace,cap_dac_read_search+ep /usr/bin/otelcol-contrib
4. Container Metrics Instead of Host
Problem: Seeing container metrics when expecting host metrics.
Solution:
# Ensure root_path is set
receivers:
hostmetrics:
root_path: /hostfs
# Verify mounts
docker inspect <container_id> | grep Mounts -A 20
Debug Logging
Enable debug logging to troubleshoot issues:
service:
telemetry:
logs:
level: debug
development: true
Monitoring Best Practices
1. Establish Baselines
- Collect metrics for at least 1-2 weeks
- Identify normal operating ranges
- Document peak usage patterns
2. Set Meaningful Alerts
# Example alert thresholds
- CPU utilization > 80% for 5 minutes
- Memory utilization > 90% for 10 minutes
- Disk usage > 85%
- Load average > number of CPU cores
3. Create Dashboards
Group related metrics for quick insights:
- System Overview - CPU, memory, load
- Storage - Disk usage, I/O, filesystem
- Network - Traffic, errors, connections
- Processes - Top consumers, process counts
4. Correlate with Application Metrics
- Link system metrics with application performance
- Identify resource constraints affecting applications
- Plan capacity based on usage trends
Integration with Uptrace
To send metrics to Uptrace:
exporters:
otlp/uptrace:
endpoint: otlp.uptrace.dev:4317
headers:
uptrace-dsn: "your-project-dsn"
compression: gzip
service:
pipelines:
metrics:
receivers: [hostmetrics]
processors: [resourcedetection, batch]
exporters: [otlp/uptrace]
See the complete Uptrace integration guide for detailed setup instructions.
Frequently Asked Questions
Q: How much overhead does the hostmetricsreceiver add?
A: Typically less than 1% CPU and 50MB memory for standard configurations. Process-level metrics may increase this to 2-3% CPU.
Q: Can I collect metrics from remote hosts?
A: No, hostmetricsreceiver only collects from the local system. For remote hosts, install the Collector on each target system or use SNMP receiver.
Q: How do I collect Windows-specific metrics?
A: The hostmetricsreceiver automatically adapts to Windows, collecting Performance Counter equivalents. Some scrapers like 'load' are Linux-specific.
Q: What's the difference between process and processes scrapers?
A:
process- Aggregate counts (total running, blocked, etc.)processes- Detailed per-process metrics (CPU, memory per process)
Q: How can I reduce the amount of data collected?
A:
- Increase collection interval
- Disable unnecessary scrapers
- Use include/exclude filters
- Apply metric filtering processors
- Aggregate metrics before export
Q: Can I collect custom metrics from /proc or /sys?
A: Not directly with hostmetricsreceiver. Consider using the filereceiver or writing a custom receiver.
Security Considerations
Principle of Least Privilege
- Run with minimal required permissions
- Use capabilities instead of root when possible
- Restrict file system access with read-only mounts
Sensitive Data
- Process names may reveal application architecture
- Network connections could expose service topology
- Consider filtering sensitive metrics before export
Configuration Security
# Use environment variables for sensitive data
exporters:
otlp:
endpoint: ${OTLP_ENDPOINT}
headers:
api-key: ${API_KEY}
What's Next?
Expand your observability stack with these complementary receivers:
- OpenTelemetry Kubernetes - Collect Kubernetes cluster metrics and metadata
- OpenTelemetry Docker - Monitor Docker containers and daemon metrics
- OpenTelemetry Redis - Track Redis performance and usage
- OpenTelemetry PostgreSQL - Monitor PostgreSQL database metrics
- Prometheus Integration - Scrape Prometheus metrics