OpenTelemetry Erlang/Elixir Metrics API
OpenTelemetry Metrics provide a way to capture measurements about your application's behavior at runtime. Unlike traces which show individual request flows, metrics aggregate data over time to show trends, patterns, and performance characteristics.
The OpenTelemetry Erlang/Elixir Metrics API is currently experimental and located in apps/opentelemetry_experimental_api of the opentelemetry-erlang repository. The API may change before stabilization. Use with caution in production environments.
Prerequisites
Ensure you have OpenTelemetry configured in your application. For setup instructions, see Monitor OpenTelemetry Erlang/Elixir with Uptrace.
Understanding Metrics
Metrics are numerical measurements captured over time that help you understand:
- Performance trends: Response times, throughput, error rates
- Resource utilization: Memory usage, CPU load, connection pools
- Business metrics: Active users, transactions, queue depths
- System health: Cache hit rates, retry counts, timeout frequencies
OpenTelemetry metrics are designed to be:
- Efficient: Low overhead collection suitable for production
- Flexible: Support for various aggregation strategies
- Standardized: Compatible with popular metrics backends
- Contextual: Can be correlated with traces and logs
Metric Instruments
OpenTelemetry provides several instrument types, each suited for different measurement scenarios:
Synchronous Instruments
Synchronous instruments are called directly in your application code when events occur:
- Counter: Monotonically increasing values (e.g., requests served, bytes sent)
- UpDownCounter: Values that increase and decrease (e.g., active connections, queue size)
- Histogram: Statistical distribution of values (e.g., request duration, response sizes)
- Gauge: Current value at observation time (e.g., CPU temperature, memory usage)
Asynchronous Instruments
Asynchronous instruments use callbacks to report values when metrics are exported:
- Asynchronous Counter: Monotonic values sampled periodically
- Asynchronous UpDownCounter: Fluctuating values sampled periodically
- Asynchronous Gauge: Point-in-time values sampled periodically
Initialize MeterProvider
The MeterProvider is responsible for creating meters and managing metric collection. It must be configured during application startup:
# config/runtime.exs
config :opentelemetry_experimental,
meters: [
# Configure meters here
]
# In your application.ex
defmodule MyApp.Application do
use Application
require Logger
def start(_type, _args) do
# MeterProvider starts automatically with the OpenTelemetry SDK
Logger.info("Metrics collection initialized")
children = [
# Your application children
]
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
end
end
Creating and Using Counters
Counters track cumulative values that only increase, such as the number of requests processed or bytes transmitted.
defmodule MyApp.Metrics do
# Get meter for this application
@meter OpenTelemetry.Metrics.get_meter(__MODULE__)
# Create a counter instrument
@request_counter @meter
|> OpenTelemetry.Metrics.create_counter(
"http.server.requests",
description: "Total HTTP requests received",
unit: "{request}"
)
def record_request(method, status_code) do
# Increment counter with attributes
OpenTelemetry.Metrics.Counter.add(
@request_counter,
1,
%{
"http.method" => method,
"http.status_code" => status_code
}
)
end
end
# Usage
MyApp.Metrics.record_request("GET", 200)
MyApp.Metrics.record_request("POST", 201)
Counter Best Practices
- Only increment: Never decrease counter values
- Use meaningful attributes: Add dimensions for filtering and grouping
- Keep cardinality low: Avoid high-cardinality attributes (e.g., user IDs)
- Choose appropriate units: Use standard units like
{request},{byte},{error}
Creating and Using Histograms
Histograms capture the statistical distribution of values, perfect for measuring latencies, sizes, and durations:
defmodule MyApp.Metrics do
@meter OpenTelemetry.Metrics.get_meter(__MODULE__)
@request_duration @meter
|> OpenTelemetry.Metrics.create_histogram(
"http.server.request.duration",
description: "HTTP request duration",
unit: "ms"
)
def record_request_duration(duration_ms, endpoint) do
OpenTelemetry.Metrics.Histogram.record(
@request_duration,
duration_ms,
%{
"http.route" => endpoint
}
)
end
def measure_operation(operation_name, func) do
start_time = System.monotonic_time(:millisecond)
try do
result = func.()
duration = System.monotonic_time(:millisecond) - start_time
record_request_duration(duration, operation_name)
{:ok, result}
rescue
error ->
duration = System.monotonic_time(:millisecond) - start_time
record_request_duration(duration, operation_name)
{:error, error}
end
end
end
# Usage
MyApp.Metrics.measure_operation("process_payment", fn ->
# Business logic here
:timer.sleep(150)
:ok
end)
Creating and Using UpDownCounters
UpDownCounters track values that can both increase and decrease, such as active connections or items in a queue:
defmodule MyApp.ConnectionMetrics do
@meter OpenTelemetry.Metrics.get_meter(__MODULE__)
@active_connections @meter
|> OpenTelemetry.Metrics.create_updown_counter(
"db.connections.active",
description: "Active database connections",
unit: "{connection}"
)
def connection_opened(pool_name) do
OpenTelemetry.Metrics.UpDownCounter.add(
@active_connections,
1,
%{"db.pool.name" => pool_name}
)
end
def connection_closed(pool_name) do
OpenTelemetry.Metrics.UpDownCounter.add(
@active_connections,
-1,
%{"db.pool.name" => pool_name}
)
end
end
# Usage in connection pool
MyApp.ConnectionMetrics.connection_opened("main_pool")
# ... use connection ...
MyApp.ConnectionMetrics.connection_closed("main_pool")
Creating and Using Gauges
Gauges capture point-in-time values that can arbitrarily change:
defmodule MyApp.SystemMetrics do
@meter OpenTelemetry.Metrics.get_meter(__MODULE__)
@memory_usage @meter
|> OpenTelemetry.Metrics.create_gauge(
"process.memory.usage",
description: "Current memory usage",
unit: "By"
)
def record_memory_usage do
memory_info = :erlang.memory()
total_memory = Keyword.get(memory_info, :total, 0)
OpenTelemetry.Metrics.Gauge.record(
@memory_usage,
total_memory,
%{"memory.type" => "total"}
)
end
end
# Can be called periodically or on-demand
MyApp.SystemMetrics.record_memory_usage()
Asynchronous Instruments
Asynchronous instruments use callbacks to report values when metrics are collected, rather than being called directly in your code:
defmodule MyApp.AsyncMetrics do
@meter OpenTelemetry.Metrics.get_meter(__MODULE__)
def setup_async_metrics do
# Asynchronous Gauge for system metrics
@meter
|> OpenTelemetry.Metrics.create_async_gauge(
"system.cpu.utilization",
description: "CPU utilization",
unit: "1",
callback: &cpu_utilization_callback/0
)
# Asynchronous UpDownCounter for queue depth
@meter
|> OpenTelemetry.Metrics.create_async_updown_counter(
"queue.depth",
description: "Number of items in queue",
unit: "{item}",
callback: &queue_depth_callback/0
)
end
defp cpu_utilization_callback do
# This function is called periodically by the metrics SDK
cpu_usage = :cpu_sup.util() / 100.0
[
{cpu_usage, %{"cpu.state" => "used"}},
{1.0 - cpu_usage, %{"cpu.state" => "idle"}}
]
end
defp queue_depth_callback do
# Query your queue system
queue_size = MyApp.Queue.size()
[{queue_size, %{"queue.name" => "main"}}]
end
end
# Initialize once at application startup
MyApp.AsyncMetrics.setup_async_metrics()
Practical Examples
HTTP Server Metrics
Complete example of instrumenting an HTTP handler:
defmodule MyApp.HTTPMetrics do
@meter OpenTelemetry.Metrics.get_meter(__MODULE__)
@request_counter @meter
|> OpenTelemetry.Metrics.create_counter(
"http.server.requests",
description: "Total HTTP requests"
)
@request_duration @meter
|> OpenTelemetry.Metrics.create_histogram(
"http.server.duration",
description: "HTTP request duration",
unit: "ms"
)
@active_requests @meter
|> OpenTelemetry.Metrics.create_updown_counter(
"http.server.active_requests",
description: "Active HTTP requests"
)
def track_request(method, path, func) do
# Increment active requests
OpenTelemetry.Metrics.UpDownCounter.add(@active_requests, 1)
start_time = System.monotonic_time(:millisecond)
try do
result = func.()
status_code = get_status_code(result)
duration = System.monotonic_time(:millisecond) - start_time
# Record metrics
attributes = %{
"http.method" => method,
"http.route" => path,
"http.status_code" => status_code
}
OpenTelemetry.Metrics.Counter.add(@request_counter, 1, attributes)
OpenTelemetry.Metrics.Histogram.record(@request_duration, duration, attributes)
result
after
# Decrement active requests
OpenTelemetry.Metrics.UpDownCounter.add(@active_requests, -1)
end
end
defp get_status_code({:ok, _}), do: 200
defp get_status_code({:error, _}), do: 500
defp get_status_code(_), do: 200
end
Database Connection Pool Metrics
defmodule MyApp.DBPoolMetrics do
@meter OpenTelemetry.Metrics.get_meter(__MODULE__)
def setup_pool_metrics(pool_name) do
@meter
|> OpenTelemetry.Metrics.create_async_gauge(
"db.pool.connections",
description: "Database pool connections",
unit: "{connection}",
callback: fn -> pool_stats_callback(pool_name) end
)
end
defp pool_stats_callback(pool_name) do
stats = :poolboy.status(pool_name)
[
{stats[:size], %{"state" => "total", "pool" => pool_name}},
{stats[:available], %{"state" => "idle", "pool" => pool_name}},
{stats[:size] - stats[:available], %{"state" => "active", "pool" => pool_name}}
]
end
end
Metric Naming Conventions
Follow OpenTelemetry semantic conventions for metric names:
- Use
.as namespace separator:http.server.duration - Use lowercase with underscores:
process_memory_usage - Include unit suffix when not using unit parameter:
duration_ms - Follow semantic conventions: OpenTelemetry Semantic Conventions
Common Units
- Time:
ms(milliseconds),s(seconds) - Data:
By(bytes),KiBy(kibibytes) - Percentages:
1(ratio from 0-1) - Counts:
{request},{connection},{error}
Best Practices
Attribute Cardinality
Keep attribute cardinality low to avoid memory issues:
Good ✅:
%{"http.method" => "GET", "http.status_code" => 200}
Bad ❌:
%{"user.id" => "12345", "request.id" => "abc-def-ghi"} # Too many unique values
Metric Selection
Choose the right instrument:
| Use Case | Instrument |
|---|---|
| Total requests | Counter |
| Request duration distribution | Histogram |
| Active connections | UpDownCounter or Async Gauge |
| Current temperature | Gauge or Async Gauge |
| Total bytes sent | Counter |
| Queue length | UpDownCounter or Async Gauge |
Performance Tips
- Create instruments once at startup, not per-request
- Use asynchronous instruments for polled data
- Batch metric updates when possible
- Keep attribute values simple (strings, numbers, booleans)