OpenTelemetry Erlang/Elixir Metrics API

OpenTelemetry Metrics provide a way to capture measurements about your application's behavior at runtime. Unlike traces which show individual request flows, metrics aggregate data over time to show trends, patterns, and performance characteristics.

The OpenTelemetry Erlang/Elixir Metrics API is currently experimental and located in apps/opentelemetry_experimental_api of the opentelemetry-erlang repository. The API may change before stabilization. Use with caution in production environments.

Prerequisites

Ensure you have OpenTelemetry configured in your application. For setup instructions, see Monitor OpenTelemetry Erlang/Elixir with Uptrace.

Understanding Metrics

Metrics are numerical measurements captured over time that help you understand:

  • Performance trends: Response times, throughput, error rates
  • Resource utilization: Memory usage, CPU load, connection pools
  • Business metrics: Active users, transactions, queue depths
  • System health: Cache hit rates, retry counts, timeout frequencies

OpenTelemetry metrics are designed to be:

  • Efficient: Low overhead collection suitable for production
  • Flexible: Support for various aggregation strategies
  • Standardized: Compatible with popular metrics backends
  • Contextual: Can be correlated with traces and logs

Metric Instruments

OpenTelemetry provides several instrument types, each suited for different measurement scenarios:

Synchronous Instruments

Synchronous instruments are called directly in your application code when events occur:

  • Counter: Monotonically increasing values (e.g., requests served, bytes sent)
  • UpDownCounter: Values that increase and decrease (e.g., active connections, queue size)
  • Histogram: Statistical distribution of values (e.g., request duration, response sizes)
  • Gauge: Current value at observation time (e.g., CPU temperature, memory usage)

Asynchronous Instruments

Asynchronous instruments use callbacks to report values when metrics are exported:

  • Asynchronous Counter: Monotonic values sampled periodically
  • Asynchronous UpDownCounter: Fluctuating values sampled periodically
  • Asynchronous Gauge: Point-in-time values sampled periodically

Initialize MeterProvider

The MeterProvider is responsible for creating meters and managing metric collection. It must be configured during application startup:

elixir Elixir
# config/runtime.exs
config :opentelemetry_experimental,
  meters: [
    # Configure meters here
  ]

# In your application.ex
defmodule MyApp.Application do
  use Application
  require Logger

  def start(_type, _args) do
    # MeterProvider starts automatically with the OpenTelemetry SDK
    Logger.info("Metrics collection initialized")

    children = [
      # Your application children
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end
erlang Erlang
%% sys.config
[
 {opentelemetry_experimental, [
   %% Metrics configuration
 ]}
].

%% In your application module
-module(my_app).
-behaviour(application).

-export([start/2, stop/1]).

start(_Type, _Args) ->
    %% MeterProvider starts automatically with the OpenTelemetry SDK
    error_logger:info_msg("Metrics collection initialized"),

    my_app_sup:start_link().

stop(_State) ->
    ok.

Creating and Using Counters

Counters track cumulative values that only increase, such as the number of requests processed or bytes transmitted.

elixir Elixir
defmodule MyApp.Metrics do
  # Get meter for this application
  @meter OpenTelemetry.Metrics.get_meter(__MODULE__)

  # Create a counter instrument
  @request_counter @meter
                   |> OpenTelemetry.Metrics.create_counter(
                     "http.server.requests",
                     description: "Total HTTP requests received",
                     unit: "{request}"
                   )

  def record_request(method, status_code) do
    # Increment counter with attributes
    OpenTelemetry.Metrics.Counter.add(
      @request_counter,
      1,
      %{
        "http.method" => method,
        "http.status_code" => status_code
      }
    )
  end
end

# Usage
MyApp.Metrics.record_request("GET", 200)
MyApp.Metrics.record_request("POST", 201)
erlang Erlang
-module(my_app_metrics).
-export([record_request/2, init/0]).

-define(METER, opentelemetry_metrics:get_meter(?MODULE)).

init() ->
    %% Create counter instrument
    RequestCounter = otel_meter:create_counter(
        ?METER,
        <<"http.server.requests">>,
        #{
            description => <<"Total HTTP requests received">>,
            unit => <<"{request}">>
        }
    ),

    %% Store in application environment or ETS
    application:set_env(my_app, request_counter, RequestCounter),
    ok.

record_request(Method, StatusCode) ->
    {ok, Counter} = application:get_env(my_app, request_counter),

    %% Increment counter with attributes
    otel_counter:add(Counter, 1, #{
        <<"http.method">> => Method,
        <<"http.status_code">> => StatusCode
    }).

%% Usage:
%% my_app_metrics:record_request(<<"GET">>, 200).
%% my_app_metrics:record_request(<<"POST">>, 201).

Counter Best Practices

  • Only increment: Never decrease counter values
  • Use meaningful attributes: Add dimensions for filtering and grouping
  • Keep cardinality low: Avoid high-cardinality attributes (e.g., user IDs)
  • Choose appropriate units: Use standard units like {request}, {byte}, {error}

Creating and Using Histograms

Histograms capture the statistical distribution of values, perfect for measuring latencies, sizes, and durations:

elixir Elixir
defmodule MyApp.Metrics do
  @meter OpenTelemetry.Metrics.get_meter(__MODULE__)

  @request_duration @meter
                    |> OpenTelemetry.Metrics.create_histogram(
                      "http.server.request.duration",
                      description: "HTTP request duration",
                      unit: "ms"
                    )

  def record_request_duration(duration_ms, endpoint) do
    OpenTelemetry.Metrics.Histogram.record(
      @request_duration,
      duration_ms,
      %{
        "http.route" => endpoint
      }
    )
  end

  def measure_operation(operation_name, func) do
    start_time = System.monotonic_time(:millisecond)

    try do
      result = func.()
      duration = System.monotonic_time(:millisecond) - start_time

      record_request_duration(duration, operation_name)
      {:ok, result}
    rescue
      error ->
        duration = System.monotonic_time(:millisecond) - start_time
        record_request_duration(duration, operation_name)
        {:error, error}
    end
  end
end

# Usage
MyApp.Metrics.measure_operation("process_payment", fn ->
  # Business logic here
  :timer.sleep(150)
  :ok
end)
erlang Erlang
-module(my_app_metrics).
-export([record_request_duration/2, measure_operation/2, init/0]).

-define(METER, opentelemetry_metrics:get_meter(?MODULE)).

init() ->
    %% Create histogram instrument
    RequestDuration = otel_meter:create_histogram(
        ?METER,
        <<"http.server.request.duration">>,
        #{
            description => <<"HTTP request duration">>,
            unit => <<"ms">>
        }
    ),

    application:set_env(my_app, request_duration, RequestDuration),
    ok.

record_request_duration(DurationMs, Endpoint) ->
    {ok, Histogram} = application:get_env(my_app, request_duration),

    otel_histogram:record(Histogram, DurationMs, #{
        <<"http.route">> => Endpoint
    }).

measure_operation(OperationName, Fun) ->
    StartTime = erlang:monotonic_time(millisecond),

    try Fun() of
        Result ->
            Duration = erlang:monotonic_time(millisecond) - StartTime,
            record_request_duration(Duration, OperationName),
            {ok, Result}
    catch
        Class:Reason:Stacktrace ->
            Duration = erlang:monotonic_time(millisecond) - StartTime,
            record_request_duration(Duration, OperationName),
            erlang:raise(Class, Reason, Stacktrace)
    end.

%% Usage:
%% my_app_metrics:measure_operation(<<"process_payment">>, fun() ->
%%     timer:sleep(150),
%%     ok
%% end).

Creating and Using UpDownCounters

UpDownCounters track values that can both increase and decrease, such as active connections or items in a queue:

elixir Elixir
defmodule MyApp.ConnectionMetrics do
  @meter OpenTelemetry.Metrics.get_meter(__MODULE__)

  @active_connections @meter
                      |> OpenTelemetry.Metrics.create_updown_counter(
                        "db.connections.active",
                        description: "Active database connections",
                        unit: "{connection}"
                      )

  def connection_opened(pool_name) do
    OpenTelemetry.Metrics.UpDownCounter.add(
      @active_connections,
      1,
      %{"db.pool.name" => pool_name}
    )
  end

  def connection_closed(pool_name) do
    OpenTelemetry.Metrics.UpDownCounter.add(
      @active_connections,
      -1,
      %{"db.pool.name" => pool_name}
    )
  end
end

# Usage in connection pool
MyApp.ConnectionMetrics.connection_opened("main_pool")
# ... use connection ...
MyApp.ConnectionMetrics.connection_closed("main_pool")
erlang Erlang
-module(my_app_connection_metrics).
-export([connection_opened/1, connection_closed/1, init/0]).

-define(METER, opentelemetry_metrics:get_meter(?MODULE)).

init() ->
    %% Create UpDownCounter instrument
    ActiveConnections = otel_meter:create_updown_counter(
        ?METER,
        <<"db.connections.active">>,
        #{
            description => <<"Active database connections">>,
            unit => <<"{connection}">>
        }
    ),

    application:set_env(my_app, active_connections, ActiveConnections),
    ok.

connection_opened(PoolName) ->
    {ok, Counter} = application:get_env(my_app, active_connections),

    otel_updown_counter:add(Counter, 1, #{
        <<"db.pool.name">> => PoolName
    }).

connection_closed(PoolName) ->
    {ok, Counter} = application:get_env(my_app, active_connections),

    otel_updown_counter:add(Counter, -1, #{
        <<"db.pool.name">> => PoolName
    }).

%% Usage:
%% my_app_connection_metrics:connection_opened(<<"main_pool">>).
%% %% ... use connection ...
%% my_app_connection_metrics:connection_closed(<<"main_pool">>).

Creating and Using Gauges

Gauges capture point-in-time values that can arbitrarily change:

elixir Elixir
defmodule MyApp.SystemMetrics do
  @meter OpenTelemetry.Metrics.get_meter(__MODULE__)

  @memory_usage @meter
                |> OpenTelemetry.Metrics.create_gauge(
                  "process.memory.usage",
                  description: "Current memory usage",
                  unit: "By"
                )

  def record_memory_usage do
    memory_info = :erlang.memory()
    total_memory = Keyword.get(memory_info, :total, 0)

    OpenTelemetry.Metrics.Gauge.record(
      @memory_usage,
      total_memory,
      %{"memory.type" => "total"}
    )
  end
end

# Can be called periodically or on-demand
MyApp.SystemMetrics.record_memory_usage()
erlang Erlang
-module(my_app_system_metrics).
-export([record_memory_usage/0, init/0]).

-define(METER, opentelemetry_metrics:get_meter(?MODULE)).

init() ->
    %% Create Gauge instrument
    MemoryUsage = otel_meter:create_gauge(
        ?METER,
        <<"process.memory.usage">>,
        #{
            description => <<"Current memory usage">>,
            unit => <<"By">>
        }
    ),

    application:set_env(my_app, memory_usage, MemoryUsage),
    ok.

record_memory_usage() ->
    {ok, Gauge} = application:get_env(my_app, memory_usage),

    MemoryInfo = erlang:memory(),
    TotalMemory = proplists:get_value(total, MemoryInfo, 0),

    otel_gauge:record(Gauge, TotalMemory, #{
        <<"memory.type">> => <<"total">>
    }).

%% Usage:
%% my_app_system_metrics:record_memory_usage().

Asynchronous Instruments

Asynchronous instruments use callbacks to report values when metrics are collected, rather than being called directly in your code:

elixir Elixir
defmodule MyApp.AsyncMetrics do
  @meter OpenTelemetry.Metrics.get_meter(__MODULE__)

  def setup_async_metrics do
    # Asynchronous Gauge for system metrics
    @meter
    |> OpenTelemetry.Metrics.create_async_gauge(
      "system.cpu.utilization",
      description: "CPU utilization",
      unit: "1",
      callback: &cpu_utilization_callback/0
    )

    # Asynchronous UpDownCounter for queue depth
    @meter
    |> OpenTelemetry.Metrics.create_async_updown_counter(
      "queue.depth",
      description: "Number of items in queue",
      unit: "{item}",
      callback: &queue_depth_callback/0
    )
  end

  defp cpu_utilization_callback do
    # This function is called periodically by the metrics SDK
    cpu_usage = :cpu_sup.util() / 100.0

    [
      {cpu_usage, %{"cpu.state" => "used"}},
      {1.0 - cpu_usage, %{"cpu.state" => "idle"}}
    ]
  end

  defp queue_depth_callback do
    # Query your queue system
    queue_size = MyApp.Queue.size()

    [{queue_size, %{"queue.name" => "main"}}]
  end
end

# Initialize once at application startup
MyApp.AsyncMetrics.setup_async_metrics()
erlang Erlang
-module(my_app_async_metrics).
-export([setup_async_metrics/0]).

-define(METER, opentelemetry_metrics:get_meter(?MODULE)).

setup_async_metrics() ->
    %% Asynchronous Gauge for system metrics
    otel_meter:create_async_gauge(
        ?METER,
        <<"system.cpu.utilization">>,
        #{
            description => <<"CPU utilization">>,
            unit => <<"1">>,
            callback => fun cpu_utilization_callback/0
        }
    ),

    %% Asynchronous UpDownCounter for queue depth
    otel_meter:create_async_updown_counter(
        ?METER,
        <<"queue.depth">>,
        #{
            description => <<"Number of items in queue">>,
            unit => <<"{item}">>,
            callback => fun queue_depth_callback/0
        }
    ),
    ok.

cpu_utilization_callback() ->
    %% This function is called periodically by the metrics SDK
    CpuUsage = cpu_sup:util() / 100.0,

    [
        {CpuUsage, #{<<"cpu.state">> => <<"used">>}},
        {1.0 - CpuUsage, #{<<"cpu.state">> => <<"idle">>}}
    ].

queue_depth_callback() ->
    %% Query your queue system
    QueueSize = my_app_queue:size(),

    [{QueueSize, #{<<"queue.name">> => <<"main">>}}].

%% Initialize once at application startup:
%% my_app_async_metrics:setup_async_metrics().

Practical Examples

HTTP Server Metrics

Complete example of instrumenting an HTTP handler:

elixir Elixir
defmodule MyApp.HTTPMetrics do
  @meter OpenTelemetry.Metrics.get_meter(__MODULE__)

  @request_counter @meter
                   |> OpenTelemetry.Metrics.create_counter(
                     "http.server.requests",
                     description: "Total HTTP requests"
                   )

  @request_duration @meter
                    |> OpenTelemetry.Metrics.create_histogram(
                      "http.server.duration",
                      description: "HTTP request duration",
                      unit: "ms"
                    )

  @active_requests @meter
                   |> OpenTelemetry.Metrics.create_updown_counter(
                     "http.server.active_requests",
                     description: "Active HTTP requests"
                   )

  def track_request(method, path, func) do
    # Increment active requests
    OpenTelemetry.Metrics.UpDownCounter.add(@active_requests, 1)

    start_time = System.monotonic_time(:millisecond)

    try do
      result = func.()
      status_code = get_status_code(result)

      duration = System.monotonic_time(:millisecond) - start_time

      # Record metrics
      attributes = %{
        "http.method" => method,
        "http.route" => path,
        "http.status_code" => status_code
      }

      OpenTelemetry.Metrics.Counter.add(@request_counter, 1, attributes)
      OpenTelemetry.Metrics.Histogram.record(@request_duration, duration, attributes)

      result
    after
      # Decrement active requests
      OpenTelemetry.Metrics.UpDownCounter.add(@active_requests, -1)
    end
  end

  defp get_status_code({:ok, _}), do: 200
  defp get_status_code({:error, _}), do: 500
  defp get_status_code(_), do: 200
end
erlang Erlang
-module(my_app_http_metrics).
-export([track_request/3, init/0]).

-define(METER, opentelemetry_metrics:get_meter(?MODULE)).

init() ->
    RequestCounter = otel_meter:create_counter(
        ?METER,
        <<"http.server.requests">>,
        #{description => <<"Total HTTP requests">>}
    ),

    RequestDuration = otel_meter:create_histogram(
        ?METER,
        <<"http.server.duration">>,
        #{description => <<"HTTP request duration">>, unit => <<"ms">>}
    ),

    ActiveRequests = otel_meter:create_updown_counter(
        ?METER,
        <<"http.server.active_requests">>,
        #{description => <<"Active HTTP requests">>}
    ),

    application:set_env(my_app, http_metrics, #{
        request_counter => RequestCounter,
        request_duration => RequestDuration,
        active_requests => ActiveRequests
    }),
    ok.

track_request(Method, Path, Fun) ->
    {ok, Metrics} = application:get_env(my_app, http_metrics),
    #{active_requests := ActiveRequests} = Metrics,

    %% Increment active requests
    otel_updown_counter:add(ActiveRequests, 1),

    StartTime = erlang:monotonic_time(millisecond),

    try Fun() of
        Result ->
            StatusCode = get_status_code(Result),
            record_metrics(Metrics, Method, Path, StatusCode, StartTime),
            Result
    catch
        Class:Reason:Stacktrace ->
            record_metrics(Metrics, Method, Path, 500, StartTime),
            erlang:raise(Class, Reason, Stacktrace)
    after
        %% Decrement active requests
        otel_updown_counter:add(ActiveRequests, -1)
    end.

record_metrics(Metrics, Method, Path, StatusCode, StartTime) ->
    #{
        request_counter := Counter,
        request_duration := Duration
    } = Metrics,

    Elapsed = erlang:monotonic_time(millisecond) - StartTime,

    Attributes = #{
        <<"http.method">> => Method,
        <<"http.route">> => Path,
        <<"http.status_code">> => StatusCode
    },

    otel_counter:add(Counter, 1, Attributes),
    otel_histogram:record(Duration, Elapsed, Attributes).

get_status_code({ok, _}) -> 200;
get_status_code({error, _}) -> 500;
get_status_code(_) -> 200.

Database Connection Pool Metrics

elixir Elixir
defmodule MyApp.DBPoolMetrics do
  @meter OpenTelemetry.Metrics.get_meter(__MODULE__)

  def setup_pool_metrics(pool_name) do
    @meter
    |> OpenTelemetry.Metrics.create_async_gauge(
      "db.pool.connections",
      description: "Database pool connections",
      unit: "{connection}",
      callback: fn -> pool_stats_callback(pool_name) end
    )
  end

  defp pool_stats_callback(pool_name) do
    stats = :poolboy.status(pool_name)

    [
      {stats[:size], %{"state" => "total", "pool" => pool_name}},
      {stats[:available], %{"state" => "idle", "pool" => pool_name}},
      {stats[:size] - stats[:available], %{"state" => "active", "pool" => pool_name}}
    ]
  end
end
erlang Erlang
-module(my_app_db_pool_metrics).
-export([setup_pool_metrics/1]).

-define(METER, opentelemetry_metrics:get_meter(?MODULE)).

setup_pool_metrics(PoolName) ->
    otel_meter:create_async_gauge(
        ?METER,
        <<"db.pool.connections">>,
        #{
            description => <<"Database pool connections">>,
            unit => <<"{connection}">>,
            callback => fun() -> pool_stats_callback(PoolName) end
        }
    ).

pool_stats_callback(PoolName) ->
    Stats = poolboy:status(PoolName),
    Size = proplists:get_value(size, Stats, 0),
    Available = proplists:get_value(available, Stats, 0),
    Active = Size - Available,

    [
        {Size, #{<<"state">> => <<"total">>, <<"pool">> => PoolName}},
        {Available, #{<<"state">> => <<"idle">>, <<"pool">> => PoolName}},
        {Active, #{<<"state">> => <<"active">>, <<"pool">> => PoolName}}
    ].

Metric Naming Conventions

Follow OpenTelemetry semantic conventions for metric names:

  • Use . as namespace separator: http.server.duration
  • Use lowercase with underscores: process_memory_usage
  • Include unit suffix when not using unit parameter: duration_ms
  • Follow semantic conventions: OpenTelemetry Semantic Conventions

Common Units

  • Time: ms (milliseconds), s (seconds)
  • Data: By (bytes), KiBy (kibibytes)
  • Percentages: 1 (ratio from 0-1)
  • Counts: {request}, {connection}, {error}

Best Practices

Attribute Cardinality

Keep attribute cardinality low to avoid memory issues:

Good ✅:

elixir
%{"http.method" => "GET", "http.status_code" => 200}

Bad ❌:

elixir
%{"user.id" => "12345", "request.id" => "abc-def-ghi"}  # Too many unique values

Metric Selection

Choose the right instrument:

Use CaseInstrument
Total requestsCounter
Request duration distributionHistogram
Active connectionsUpDownCounter or Async Gauge
Current temperatureGauge or Async Gauge
Total bytes sentCounter
Queue lengthUpDownCounter or Async Gauge

Performance Tips

  • Create instruments once at startup, not per-request
  • Use asynchronous instruments for polled data
  • Batch metric updates when possible
  • Keep attribute values simple (strings, numbers, booleans)

What's next?