Metric monitors
A metric monitor evaluates a UQL expression over a rolling time window and fires an alert when the computed value exceeds a threshold. Any metric Uptrace collects is available: OpenTelemetry host metrics, custom application metrics, and the internal span and log metrics Uptrace generates automatically.
How evaluation works
Each metric monitor defines:
- Metrics — one or more metric aliases used in the query expression.
- Query — a UQL expression that produces a single numeric value per evaluation cycle.
- Threshold (
max_value) — the value above which an alert fires. - Evaluation points (
num_eval_points) — how many consecutive data points must exceed the threshold before firing. Higher values reduce noise from spikes.
Uptrace evaluates the expression on a fixed schedule. On the first breach, it creates an alert and sends a notification. If the value recovers below the threshold, the alert closes and a recovery notification is sent.
To create a monitor, go to Alerting → Monitors → New monitor → From YAML and paste one of the examples below.
Infrastructure metrics
The following monitors work with the OpenTelemetry Host Metrics receiver.
CPU usage:
monitors:
- name: CPU usage
type: metric
metrics:
- system_cpu_load_average_15m as $load_avg_15m
- system_cpu_time as $cpu_time
query:
- $load_avg_15m / uniq($cpu_time.cpu) as cpu_util
- group by host_name
column:
name: cpu_util
unit: utilization
detector:
type: manual
max_value: 3
num_eval_points: 10
Filesystem usage:
monitors:
- name: Filesystem usage
type: metric
metrics:
- system_filesystem_usage as $fs_usage
query:
- $fs_usage{state='used'} / $fs_usage as fs_util
- group by host_name, mountpoint
- where mountpoint !~ "/snap"
column:
name: fs_util
unit: utilization
detector:
type: manual
max_value: 0.9
num_eval_points: 3
Disk pending operations:
monitors:
- name: Disk pending operations
type: metric
metrics:
- system_disk_pending_operations as $pending_ops
query:
- $pending_ops
- group by host_name, device
detector:
type: manual
max_value: 100
num_eval_points: 10
Network errors:
monitors:
- name: Network errors
type: metric
metrics:
- system_network_errors as $net_errors
query:
- $net_errors
- group by host_name
detector:
type: manual
max_value: 0
num_eval_points: 3
Span and log metrics
Uptrace generates three internal metrics from your tracing pipeline that you can query in metric monitors:
| Metric | Description |
|---|---|
uptrace_tracing_spans | Span count and duration. Excludes events and logs. |
uptrace_tracing_logs | Log record count. Excludes spans and events. |
uptrace_tracing_events | Event count. Excludes spans and logs. |
All span attributes are available for filtering and grouping — for example where _status_code = 'error' or group by service_name. See Querying spans for the full attribute reference.
PostgreSQL SELECT duration:
monitors:
- name: PostgreSQL SELECT duration
type: metric
metrics:
- uptrace_tracing_spans as $spans
query:
- avg($spans)
- where _system = 'db:postgresql'
- where db_operation = 'SELECT'
detector:
type: manual
max_value: 10 # milliseconds
num_eval_points: 5
Database operation latency (p50):
monitors:
- name: Database operations duration
type: metric
metrics:
- uptrace_tracing_spans as $spans
query:
- p50($spans)
- where _type = "db"
detector:
type: manual
max_value: 10 # milliseconds
num_eval_points: 5
Log error rate:
monitors:
- name: Number of errors
type: metric
metrics:
- uptrace_tracing_logs as $logs
query:
- perMin(sum($logs))
- where _system in ("log:error", "log:fatal")
detector:
type: manual
max_value: 10
num_eval_points: 3
Failed HTTP requests:
monitors:
- name: Failed requests
type: metric
metrics:
- uptrace_tracing_spans as $spans
query:
- perMin(count($spans{_status_code="error"})) as failed_requests
- where _type = "httpserver"
detector:
type: manual
max_value: 0
Alert names
For metric monitors, Uptrace generates alert names using the monitor name and timeseries name, for example, "Disk usage: myhost+mydisk".
For error monitors, Uptrace generates alert names using the error (log) message, for example, "ERROR *fmt.wrapError: writeError failed".
You can customize alert names by specifying a Go template string as the monitor name when creating a monitor, for example, {{ .Attrs.deployment_environment_name }}: {{ .DisplayName }} will prefix the alert name with the deployment environment attribute.
You can use the following variables in templates:
| Variable | Type | Description |
|---|---|---|
{{ .DisplayName }} | string | Same as _display_name when querying spans and logs. |
{{ .Attrs }} | mapstringany | All available attributes, for example, {{ .Attrs.service_name }}. |